


THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


(FOUNDED BY H. C. CARVER) 


THe OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


Additive Partition Functions and a Class of Statistical Hypotheses. 
J. WOLFOwITz 


PAGE 


2 


On the Theory of Testing Composite Hypotheses with One Con- 
eines. Eimer HOMO... oan 8c eens ee rs 


On the Problem of Multiple Matching. I. L. Barrin............ 294 


On the Choice of the Number of Intervals in the Application of the 
Chi Square Test. H.B. Mann anp A. WALD.............. 306 


Limited Type of Primary Probability Distribution Applied to 
Annual Flood Flows. Braprorp F. KimBaL................ 318 


Linear Restrictions on Chi-Square. FRANKLIN E.SATTERTHWAITE. 326 


Systems of Linear Equations with Coefficients Subject to Error. 
Wi es RN ss chiae's wanes coee sabia’ wages oak aes 332 


On Mutually Favorable Events. Kar-Lart CHUNG............... 338 
Observations on Analysis of Variance Theory. Hitpa Gerrinaer.. 350 





Vol. XIII, No. 3 — September, 1942 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
S. 8. WILKS, Editor 


A. T. CRAIG J. NEYMAN 


WITH THE COOPERATION OF 


H. C. Carver R. A. FisHEr R. von MIszEs 
H. Cramtér T. C. Fry E. 8. PEARSON 
W. E. Demine H. Hore..ine H. L. Rirerz 

G. Darmois W. A. SHEWHART 


The Annats or Matuematicat Statistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore, 
Md. Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to the ANNALS OF MaTHEMATICAL Statistics, Mt. 
Royal & Guilford Aves., Baltimore, Md., orto the Secretary of the Insti- 
tute of Mathematical Statistics, E. G. Olds, Carnegie Institute of Technology, 
Pittsburgh, Pa. Changes in mailing address which are to become effective for 
a given issue should be reported to the Secretary on or before the 15th of the 
month preceding the month of that issue. The months of issue are March, 
June, September and December. , 


Manuscripts for publication in the ANNALS OF MATHEMATICAL STATISTICS 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot- 
notes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALs is $5.00 per year. Single copies $1.50. 
Back numbers are available at $5.00 per volume, or $1.50 per single issue. 


CoMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
Battmmorsg, Mp., U.S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the Act of March 3, 1879 





Ps 











ADDITIVE PARTITION FUNCTIONS AND A CLASS OF STATISTICAL 
HYPOTHESES 


By J. WoLFrowi1tTz 
New York City 


1. Introduction. The purpose of the first part of this paper is to prove 
several theorems about a class of functions of partitions which are additive in 
structure and subject to mild restrictions. These theorems may be regarded as 
contributions to the theory of numbers, but if one makes certain assignments 
of probabilities to the partitions the theorems may be expressed as statements 
about asymptotic distributions. It is in this latter, probabilistic language, that 
we shall carry out the proofs, for the following reasons. The discussion will be 
more concise and certain circumlocutions will be avoided. The theorems have 
statistical application and a number of theorems discussed recently in statistical 
literature are corollaries of one of our theorems. 

In the second part of this paper the theory of testing statistical hypotheses 
where the form of the distribution functions is totally unknown and only con- 
tinuity is assumed, will be discussed. The exact extension of the likelihood 
ratio criterion to this case will be given. Approximations to the application of 
this criterion in two problems will be proposed, one of which applies the results 
mentioned above. Lastly, in connection with the second problem, a combina- 
torial problem will be solved which is new and has interest per se. 


2. Partitions of a single integer. Let n be a positive integer and A = 


(a;, @2,°+-,@;) be any sequence of positive integers a; (¢ = 1, 2,---,s), 
where >, a; = n, and s may be any integer from 1 to n. Two sequences A 
i=1 


which have different elements or the same elements arranged in different order 
are to be considered distinct, so it is easy to see that there are 2" ’ sequences A. 
We shall consider the sequence A as a stochastic variable and assign to all 
sequences A the same probability, which is therefore 2."**. Let 7; be the 
number of elements a in A which equal j (j7 = 1, 2,---,n), so that r; isa 
stochastic variable. Let k be an integer < n. Then the joint distribution of 
the stochastic variables 7 , r2, --+ , rT; is given as follows: The probability that 
r; = b; (¢ = 1, 2, --- , k) is 


awh dt r! 
- x x (bi)! (be)! +++ (be)! (raspy)! +++ (rn)! 
where the inner summation is carried out over all sets of non-negative integers 
'e41),*** ,?n Such that 


(2.2) by + be + oes + Oe t+ rasan tees +r. = 7, 
(2.3) bi + 2bo +--+ + hoe + (K+ Drauy +--+ $n, = 1. 
247 





248 J. WOLFOWITZ 


(The b; , of course, are non-negative integers. ) 
Let r = } » r;, and 
i=1 
(k <n), 
so that r and 7,4: are both stochastic variables. The probability that at the 
same time 
(2.4) 7 =06;, 
and 
ee ’ ’ 
(2.5) Tuk+)h = bce+1) - 
is given by (2.1) with the restriction 
(2.6) T (k+1) +--+, = bean ’ 
added to the restrictions (2.2) and (2.3). With this added restriction the 


k 
summation in (2.1) may be performed as follows: Let ¢ = >> ib;. It is easy 
i=1 
/ Ul 
to see that the number of sequences A where every a; > kyr = reraiy = Dean, 
and Sa; = n — t, is given by the coefficient of «” ‘ in the purely formal expan- 
sion in x of 


b’ 
k+l k+2 k+3 bt. _ (kt? 1 k+1) 
(att) 4 att? 4 ght 4... Per = a sou ( ) 


1—2z 


and is 


(’ —t— kbusn — ’) 
Dinan —_ 1 


Hence P{ (2.4) and (2.5)}, where this svmbol will always denote the probability 
of the relation in braces, is seen to be 


k 
—n4+1 (> ne Kiasn) n—t—kbey — 1 
- ° : " bia) — 1 ; 
(bas)! IT (6)! (ht 
i=l 


If X is a stochastic variable, let E(X) and o(X) denote, respectively, the 
mean and variance of X (if they exist), and if Y is another stochastic variable, 
_ ; : o X — E(X 
let o(XY) be the covariance between X and Y. Also let X = - x ) 
O\« 
By the distribution of X we shall mean a function g(x) such that P{X < x} = 
g(x). These conventions being established, we seek first to evaluate E(r;). 
This may be done by differentiating with respect to y the coefficient of x” in the 





ADDITIVE PARTITION FUNCTIONS 249 


purely formal expansion in z of 2°"7"(a4 + a° + --» +a°*4+yr'ta°ot+---y, 
setting y = 1 and summing over all values of r. We have therefore to evaluate 


n n—-it-—l 
oe. , 
r=? ( r—2 ) 


which is easily seen to give us the result 
(2.8) E(r;) = (n-it+ =". (i <n), 
while it is obvious that 


(2.9) E(r,) = 2°", 


By use of similar devices the variances and covariances of the r; may also be 
obtained. We omit the details of those calculations and also the presentation 
of the covariances, since the latter are not necessary for the proof of Theorem 2. 
The results are: 


2/ 1 ,3-2i 3—i , 3 —12%+5 
(2.10) o (7) = nf sts + ae) . Ca + ee y. (¢ < gn). 


The limitation on the value of 7 is necessary because the processes for summing 
binomial coefficients with the aid of the device described above are no longer 
applicable. The matter is easily settled, however, for if X is a stochastic 
variable which can take only the values 0 or 1, then 


o(X) = E(X) — [E(X)f. 


The r; for i > 43n are such variables, so that 


— rh ee -© anes (n > i> 3n), 


"= 1) 
22 n—2 r 


Also without difficulty we have i 


n+6 (n+ 6) | 1 
Di (n+4) a Ont - Qn-2? 


(2.12) o(r,) = 


(2.13) O(n) = 


when n is even and > 2, and 

(2.14) E(r) = 3(n + 1), 
(2.15) a (r) = 3(n — 1). 
Finally, 

(2.16) E(ruay) = (n — k + 1)2%". 


The next results we shall need may be expressed in the following: 
THEOREM 1: As n approaches infinity, the joint distribution of the stochastic 








250 J. WOLFOWITZ 





s . xs it ; — ‘ . 
variables 7, , +++, Fe, Fussy (k any fixed positive integer), approaches the multi- 
variate normal distribution. 
This theorem is proved as follows: Make the substitutions 
—i—] 
re — n-2" 
—_ ’ 
Vn 
’ —k—1 
’ ins = n:2 


- Va 


r= 


(= 1, 2, ---, k), 


U(k+1) —_ 


in the expression 


I 
(> rj + ras)! n—-t— kr nay — ] 
_ +1 


i=l 


, ’ 


(roa)! II (7;)! row — 1 


i=l 


k 
which comes from (2.7), and regard t as equal to >> ir;. Replace the various 
i=1 


factorials by their asymptotic approximations as given by Stirling’s formula and 
simplify the resulting expression. The subsequent procedure is simple but 
laborious and we omit the details, which are like those of the classical proof of 
De Moivre’s theorem as given, for example, in Frechet [1], p. 89. 

We now prove the following theorem on additive partition functions: 

THEOREM 2: Let f(x) be a function defined for all positive integral values of x ° 
which fulfills the following conditions: 

(a). There exists a pair of positive integers, a and b, such that 
(2.17) fla) A a 

f(b) b 

(b). the series 


x 


(2.18) 2, if) | 2", 


1=1 





converges. Let F(A), a function of the stochastic sequence A, be defined as follows: 











(2.19) F(A) = fla). 


i=1 
Then for any real y the probability of the inequality F(A) < y, approaches 
y 


1 2 
an H e™ dy, 
T J-w 


asn— ~%. 
We restate this theorem without use of probabilistic terms: 
Let A be any sequence of positive integers whose sum is a given integer n. 
Consider two sequences A to be different if they contain different elements or 




















ADDITIVE PARTITION FUNCTIONS 251 


the same elements arranged in a different order. Let f(z) and F(A) be defined 
as above, with the aforementioned restrictions. Then there exist, for every 
positive integer n, two numbers E, and ¢,, such that 2."~" multiplied by the 
number of sequences A for which the inequality 


F(A) — Ey, < yor, 


holds, approaches 


hy? dy, 
asn— x, 

For convenience, the proof will be divided into a number of lemmas. 

If ¢(y) is any continuous distribution function, then it is well known that ¢(y) 
is uniformly continuous and that consequently, for any arbitrarily small, posi- 
tive e, there exist two positive numbers, h and D, with the following properties: 

(a). If y; and yo are any real numbers such that |y — ye) <. h, then 

gly) — (ye) | < «6 

(b). If y is such that | y | > D, then g(|y|) > 1 — ¢, andg(—|\y|) < «. 

We now first prove 

Lemma 1: Let X and Y be two stochastic variables, both of which possess finite 
means and variances. Suppose that there exists a continuous distribution function 
y(y) and two small positive numbers € and 6 (say « < 1/10, 6 < 1/10), such that 


(2.20) |PiX <y} — oy) | <6 
for all y, and 


o(Y) 


o(X) = 6. 


Let h and D be chosen as above for ¢(y), with the additional proviso that h < 4 
and D> 1 ~~ Suppose further that 


, h_ he 
> 6 eas a “ia 
(2.22) 6 < min (; ‘ :) 


Then 
(2.23) |P{(X + Y) < y} — ¢fy)| < 3¢, 
for all y. 
Proor: We have 
o(X + Y) = 0(X) + 20(XY) + o(¥). 
Since, as is well known, 


o( XY) | < of X)a(¥), 





252 J. WOLFOWITZ 
it follows from (2.21) that 
(2.24) o(X + Y) = (1 + 8)o(X), 
where | 6’| < 6 Hence 
. Y — E(Y) 
2.2 =~ } < 26. 
_ o(PeFS) ' 
From Tchebycheff’s inequality and (2.21) it then follows that, if d = h/4, 


(iy — E(Y) 6 
£ > Z = “ » a 
(2 26) P \ (X Y) > di <4 7? 


and 


(2.27) ~. ee 
d? 

Now 
(x — MX) ) (x — B(x) 
> | —~d>=P - 2 i pm de 
rik = 8 ee 
X — E(X) 
o(X + Y) 
<P{(X+Y)<y}t+e 

Y — E(Y) | 
AXLY) <4 


+ P< <y- 


= P(X +Y)<y; 


Lee, 1 me) 
er ane 
(X — E(X) ) 

PE ae + db + 2. 
waa 


‘ 

>d>+-< 
) 

<= 


Hence, from (2.24) 
PiX <(y—da)(1+6)} -—.e« 
(2.29) - ” 
<P{(X+Y)<y} < PIX <(y+a(1+sé)} +6 
and consequently, from (2.20) 
g(y —d + ys’ — dé’) — 2e 
<P{(X + Y) < y} <¢elytd+t yi’ + dé’) + Qe. 
Now if | y | < 2D, then from (2.22) 


(2.30) 


+iw'l+elei<t+2 
=e ras 





ADDITIVE PARTITION FUNCTIONS 
and if | y| > 2D, then also from (2.22) 


| rt ion h 3, 3 
ly] -—d—|y'|—d/F|>\y|/A-8)-5> Glyl>5D. 


Recalling the definitions of h and D, it follows from (2.30) that, for all y, 
(2.31) gly) — 3e < PI(XN + Y) < y} < o(y) +: 3e. 


This proves Lemma 1. 
LEMMA 2: For any fixed pair a, b, of positive integers such thata < b, 


,  [E(r P(E (m) 
(2.32) lim EGP 


Proor: From (2.8), for fixed 7 


£ 8) +9. 
n 


Dee ' 
and from (2.14) 2 E(r > as . The required result follows easily. 
’ 


For any n we now define 


B(k;n) = mr rfl, 


nm 


C(k:n)= ie ri[f(2)). 


i=k+1 
Then 
F(A) = Bik;n) + C(k;n). 


LemMA 3: For any real y and any fixed positive integral k the probability that 

the stochastic variable B(k;n) shall fulfill the inequality B(k;n) < y approaches 
¥ 

7a: | ec dyjasn > &. 
V 2r Lew 

Proor: By Theorem 1, the stochastic variables 7, 72, °°: , 7%, Fir4y’ are 
asymptotically jointly normally distributed. As an immediate consequence so 
are the variables 7, 72, +--+ ,7, and hence B(k;n), which is a linear function 
with constant coefficients f(1), f(2), --- f(k), of 1, r2, +++ , 7, is asymptotically 
normally distributed. 

Lemma +. There exists a constant c > 0, such that, for all n sufficiently large, 


(2.33) o (F(A)) > en. 
Proor: For any sufficiently large, arbitrary, but fixed n, we will construct 


two sets, S; and Ss, of sequences A, with the following properties: S; and S2 
have the same probability p, with p always greater than 6, a fixed positive 














254 J. WOLFOWITZ 
constant which does not depend on n. Since the probabilities of S; and S: are 
equal, each possesses the same number of sequences A. Between the member 
sequences of the sets S; and S2 we will establish a one-to-one correspondence 
such that, if A; is a member of S; and Az is its corresponding sequence in Sz , 
then 


(2.34) | F(A1) — F(A2) | > 2dV/n, 


where d is a fixed positive constant which does not depend on n. 
It is easy to see that such a construction would prove the lemma. The 
*y° . ° oe ° . . 
probability of any sequence A is 2."". Hence the contribution of a corre- 
sponding pair A; and A, to the variance of F(A) is by (2.34) not less than 
2°"""d'n and the contribution of the sets S; and S2 is not less than 26d°n. 

It remains then to carry out the construction of S;and S:. For the sake of 
simplicity in notation, we shall carry out the construction with the assumption 
that the integers a and b of (2.17) are 1 and 2. It will be readily apparent, 
however, that the proof is perfectly general and with trivial changes holds for 
any pair a,b. This lemma is the only place where the hypothesis (2.17) is used. 
The latter condition is necessary because, if for every pair of positive integers 
t and J, 

f@ _ i 
— a 
I) 9 
then F(A) is a constant multiple of n, for n = 2) ir; and then 
v 
F(A) = Dif(a) = Derfl) =f) DL iri = nf(). 
a t t 
Each sequence A uniquely determines the ‘“‘coérdinate’’ complex 
("1 img OSG Tr} 
which we prefer to write as the pair L = (I, l’): 
L = {ri, 72}, 
l’ => {r3, 14, ae Mane 
To each pair (J, l’) there correspond in general many sequences A whose exact 
number may be explicitly given in terms of factorials. The totality of all A 
whose L have the same second member /’ will be called the group determined 
by l’, or just the group 1’. The subset of a group l’ all of whose A have the 


same 7; will be called the family (l’, 7;). All the A in the same family have the 
same L. For l’ and 7 determine r2 through the equation 7 rT, =n. 
a 


According to Theorem 1 for k = 2, 1, 72, r3 are asymptotically jointly 
normally distributed. Let 


lim o(11) 
= a= 
n— 00 Vn 







ADDITIVE PARTITION FUNCTIONS 255 


The limiting variances of r2 and r3; are constant multiples of noi. Therefore 
the set H of all A whose L satisfy the constraints 


5 ons + /nai 
5; <n< - + 4/no1 


5 + Vn o1 
has, by virtue of the fact that the limiting correlation coefficients of the variables 
m1, T2, T3 are all less than 1 in absolute value, a positive probability, which 
exceeds a fixed positive constant y for sufficiently large n. If any member 
sequence A of a family is in H, the entire family is obviously in H. Any se- 
quence A belongs to one and only one family. Hence the set H may be decom- 


<n'< 
8 3 


; a ‘ ; - n . 
posed in a disjunct way into entire families. Let (", 4 + i) be any family 


in H, where of course 0 < hi < ~Wnoai. Consider the (second) family 


(v, 4 + 20/no.+ i). This family is not in H. We now wish to show that 


the probability of the second family exceeds c’ times the probability of the first 
family, where c’ is a fixed positive constant which does not depend on either n 
or the particular families in question. 

For the first family, let 


n= 7th, rs + hs, 


+ he, 


nis eo1 3 


4 hy a he = hs. 
Hence 


(2.36) O<h < Vn (¢ = 1, 2,3). 


For the second family we therefore have, since both families are in the same 
group, 


+ 20/noi + hh, 
_ V/ nor + he, 
= + hs, 


; + +/an +h t+ hy. 





256 J. WOLFOWITZ 


The ratio of the probability of the second family to that of the first family 
equals the ratio of the number of sequences A in the second family to the 
number of sequences A in the first family. By elementary combinatorics, since 
both families are in the same group, the latter ratio is 


(; + Vives + hy + he + hs)! (7 +m): (2 + ha)! 
(2.37) = / ~ 


(" + 2./no1 + in) (2 — fn + in) (3 +h + + ha)! 
and hence exceeds 


Jno 
(3 + hi t+ he + is) 


at —24/n a) — af no 
4 ( + 2o/no, + is) ( _ / noi - in) ; 


At this point, if we had been using the numbers a and b of (2.17), we would 
make use of Lemma 2. In the present case the result of that lemma is trivial. 
It is easy to see, therefore, that (2.38) equals 


(1 4 2hi + 2he + _ 


n 





(2.38) 


(2.39) 


aa, a 


n 


which, in view of (2.36), exceeds 


—2.1/no 8 Vn oy 
(2.40) (1 + 2) (1 . 
V/n Vn 


which, in turn, for sufficiently large n, exceeds 
(2.41) 1. — 1,780: 
We are now ready to construct S; and S.. 
hi —_ (?’. r) 
be any family in H and consider the family 
fe = (UV, + 2V/na). 


Select in any manner whatsoever c’y of the sequences A in fi, where v is the 
total number of sequences in f;. Call this set of sequences f*. Select in any 
manner whatsoever c’y sequences from f2 and call this set f**. That there exist 
at least c’y sequences in fo is assured by equation (2.41). In any manner what- 
soever establish a one-to-one correspondence between the sequences of f* and 
f**. Suppose A; and A» are corresponding sequences. Since f* and f** belong 
to the same group, and since f(2) # 2f(1), we have 





ADDITIVE PARTITION FUNCTIONS 


| F(Ai) — F(A2) | = fQ)V iio — 2f(1)V/no 
= | f(2) — 21) |/na, 


(2.42) 


so that (2.34) holds with 
(2.43) d = }/ f(2) — 2f(1)|a. 


Now proceed in this manner for all the families f; in H. The union of all the 
sets f* is the set S, and the union of all the sets f** is the set S.. It is clear 
that, since the probability of H exceeds y, the probability p of S, exceeds 
8 = c'y. This proves Lemma 4. 

LemMa 5. For any arbitrarily small positive number & there exists a positive 
integer u(é), such that for any k > w(é) and all n greater than a fixed lower bound, 


(2.44) o[C(k;n)] < én. 
Proor: Since 
C(k;n) = = rif(i), 
t=k+1 
and, as is well known, 
| (XY) | < o(X)o(Y) 


we have 
! 


(2.45) o[C(k;n)] <| = MO Lo(rd | 


i=k+1 


From (2.10) it follows readily that 
, 9 n 5 —1 37° 
(2.46) o (r;) 4 5i + oes +( iti + z 
and the quantity in parentheses in the right member of (2.46) is easily seen to 
be negative, so that, for i < 3n and n > 3, 
(2.47) o(r:) < V2n2" 
From (2.11) and the definition of r;, it follows easily that (2.47) holds also 
when i > 3n and n > 3. 

Hence, in view of (2.12), (2.13), and the convergence of the series in (2.18), 
the desired result follows from (2.45). 

Lemma 6. Let the —& of Lemma 5 be < 4c, where c is as in Lemma 4. Then 
fork > w(é) and n larger than a fixed lower bound 


(2.48) o (B(k;n)) > ten. 
Proor: Since 


F(A) = Bik;n) + C(k;n), 





258 J. WOLFOWITZ 


we have 


o (F(A)) = o (B(k3n)) + o (C(k;n)) + 20(BC) 
< o (B) + o (C) + 20(B)o(C) = (o(B) + a(C))’. 


Hence from (2.33) and (2.44) ~/en < o(B) + 3+/cn and the required result 
follows. 

PROOF OF THE THEOREM: Let ¢ be an arbitrarily small positive number. For 
all n sufficiently large we have, by Lemma 3, 

a 1 y 1,2 
P{Blk;n) < y} -— —— eo dy <<, 
V 2r /-~ 

for all y. For a small ~ to be chosen later and large enough k and n we have, 
by Lemmas 5 and 6, 


o(C(k;n)) _ 
o(B(k3n)) 


At 
§< a 


(2.49) ~~ 


Now let the ¢(y) of Lemma 1 be defined as 


1 u . 
gy) == yh ds 
y V/ 24 — 00 Ys 


and choose h and D as in Lemma 1 for our present ¢«. Since c is fixed and & 
still at our disposal, choose é sufficiently small so that the 6 of (2.49) satisfies 
(2.22). Since the hypothesis of Lemma 1 is satisfied, we have, from (2.23) and 
Lemma 3, for all n sufficiently large, 


| P{F(A) < y} — ¢ly) | < 3e 


for all y. This is the required result. 


3. Partitions of twointegers. Let n; and nz be positive integers, n; + ne = n, 
ny No i 
— = &,— =@,ande = max (¢,¢@). Let V = (11, 2, «++, vs) be any sequence 
n n 
of positive integers v; (¢ = 1,2, ---, s) where a; + a3 + a5 + --- equals either one 
of mn, and ne , while a2 + ay + ag + --- equals the other. Such sequences are of 
statistical importance (cf. Wald’ and Wolfowitz [2]). As before, sequences V 
with different elements or with the same elements in different order will be con- 
sidered different and to each sequence V will be assigned the same probability, 


Tr 
° . ° ° ny: Me: 
which is therefore easily seen to be — 


ni 

Let 7; be the number of elements equal to 7 in that one of the two sequences 
(a, , 43, G5, °--) and (a2, ay, ag, ---) the sum of whose elements is n; and let 
ro; be the corresponding number for the other sequence. Let 





ADDITIVE PARTITION FUNCTIONS 


8 = Ti + 12% 5 


= Drs, te = Dore, 
i i 
, 
1 + 1, T1(k+1) 


, 
To(k+) = 


The necessary computations such as are given in the beginning of the previous 
section have keen performed by Mood [8] and we summarize them as follows: 

THEOREM 3 (Mood): As n approaches infinity while e, and e2 remain constant, 
the joint distribution of the stochastic variables 


- _ ” aif as - 
Tn 5 Tae, °** 5 Tks Tas) 6 Fs THs *** Th 


(where k is any fixed positive integer), approaches the multivariate normal distribu- 
tion. 


Mood (loc. cit.) gives the following parameters, with the convention that 


(3.1) a =2(2 — 1)\(4 — 2) --- (x —i+ 1): 
(2), (i) 
m 


(3.2) E(ry) = (MD 


nity ? 


(3.3) lim E(ru) = eyes, 


n—s nr 


‘ E ry bt. ) = 
(3.4) lim E(ria Dp” = ef ee, 


nx n 


n cit?) | nit) nit) 


” (2) (i) ¢ (2), (i) (2), (i) 
9 9 9 ° 1 iz 1 
a (1) = = (ne + 1) = ~_+ = = ) a j (1 sl (ne + ) ny +) ’ 


(3.6) lim o (ri) 


n 20 n 


= ef [i+ 1) ae — fe — 2a) + ee. 


The corresponding parameters for 72; may be obtained from the above by inter- 
change of mn; and nm. Also 


= €j€2. 


(3.7) lim E(ry) = jim 2“ 


nus nv no nv 


For additive partition functions we have the following theorem: 

THEeoreM 4. Let f(x) be a function defined for all positive integral values of x 
which fulfills the following conditions: 
a) There exists a pair of positive integers, a and b, such that 


: I@ L«@, 
(3.8) ie) *b° 





260 J. WOLFOWITZ 


b) the series 


(3.9) > 1s |e” 

converges. Let F(V), a function of the stochastic sequence V, be defined as follows: 
(3.10) FV) = D seo) 

Then for any real y the probability of the inequality F(V) < y approaches 


1 ” =fyt 
val 


asn — ©, while e, and e2 remain constant. 

The basic idea of the proof of this theorem is the same as that of the proof 
of Theorem 2. We omit all the steps which can be written without difficulty 
by analogy to those in Theorem 2 and present only those where some major 
change is necessary. The numbering of the lemmas will correspond to that of 
Theorem 2. 

LemMa 2. For any fixed pair, a and b, of positive integers such that a < b, 


(3.11) [E(rs)P*-(E (v2) -(E(rw) (Er) PTE (re) [Ere >, 


asn— ©, 
The proof is the same as before. 
The following are the definitions corresponding to those of Theorem 2: 


k 


B(k3n) = Do sif(i), 


i=1 
n 


C(k;n) = 2 sif(i). 


t=k+ 
Then as before 
F(V) = Btk;n) + C(k3n). 


Lema 4. Statement is the same as that for Theorem 2. The following im- 
portant changes must be made in the proof: 
Each sequence V determines-the coordinate complex 


47s *** % Tin| 





ADDITIVE PARTITION FUNCTIONS 


113, °** Tin) 


15 °° 9 Fle 


The set H is the set of all V whose L satisfy the constraints 
nerves <n < neer + Ynon, 
Neves < Tr < neies + /n Tu, 
Ne\e2 < fa < Nexe2 + Vn O11, 
never < Ta < neyes + W/non, 


3 / 3 
neyo < 113 < Ne\e2 + a/n Ou, 


where 
on = tn 
no /n 
The representative family for H is characterized by 
(I’, nees + hu), 
and this family is compared with the family 
(U’, neeg + 20/n on + hn). 
For the members of the family in H 
ry = neyes + hy = nm t+ hu 
12 nexe> + his nmy + hie 
exes + ha nma + he 
neier + hoe = nme + hes 
neses + his nM13 + his 
neyeo +h = nm +h, 
lrz—n| <1, 
where 
(3.12) hij < Vnou, 
(3.13) h = hy + he + his. 
And for the members of the second family 
rn nmy + 2>/non + hu, 


_— 
rig mm — Vnou + he, 









J. WOLFOWITZ 









= nm, + Q2/n 011 + he + O21 , 


ia = iin = Vn ou + hoo + 622, 
/ / 
T13 = nNmM3 + hy; , 


rn =namt+Ynon +h, 
lr—n|<l, 
with 
| @1| < 1, | Q2| <1. 



















To the expression (2.37) corresponds the expression (3.14), with | @| < 1: 


(nmu + hy)! (nmy + hi)! x (nma + ho)! (nm: » + ho)! 
(nm +h)! (nm + h)! 
(3.14) (nm +h+~V/nou)! 
eae + 20/nou+ hu)! (nme — /no ou + hy)! 


(nm + ht Vn ou + g)! 
* me + 20/ n ou + hay + 62)! (nme = / n Ol a+ he + B20) ! 


? 


which exceeds 
(nm + ers X (nmu + 2/n xitay _ 
x (nmu — non + hy)¥* ™ 
x (nets + 90a on + hn) OY" 
X (nmx — Wnou + hoo)? , 


Employing Lemma 2, we find that (3.15) equals 


¢ + A) ae p 4 2V/n ou + “ eee 
nm 


nm 


(3.15) 


] 


-{- 


xX {1 


-+f- 


= vn ou + a. noe 


nM. 


(3.16) 
NM2 


: +> 


( 
( 

x x (1 4 2V non =" Vien 
x (145 


J/nou + _ nou 


NMe2 


In view (3.12) and (3.13), (3.16) exceeds 








ae x - eee 
(3 17) mmy nM 2 
17) . i 
x (1 48 3V/n =) patie x (1 i. Vn ay — 
nM2 NM22 , 


ADDITIVE PARTITION FUNCTIONS 


which, for sufficiently large n, in turn exceeds 


(3.18) feet ($ en ee :) wid. 
mi Mie moi M22 
Lemma 5. Statement is the same as for Theorem 2. The proof then pro- 
ceeds as follows: 
We have 


2 n 2 
(3.19) (O(n) $ (YS 1S) Lord). 
i=l j=k+1 

From an examination of (3.5) and (3.6) we may see without any difficulty that 
the second of the three terms of the right member of (3.5) (after removal of 
parentheses) is asymptotically equal to n times the last term of the right member 
of (3.6) and hence that the other two terms of the right member of (3.5) are 
asymptotically equal to n times the right member of (3.6) without its last term. 
Now when 


fle cy 


which will always occur when 7 is equal to or greater than a sufficiently large 
fixed integer u, that part of the right member of (3.6) which is in square brackets 
is easily seen to be negative. Hence from the definition of asymptotic equiva- 
lence it follows that, for all n sufficiently large, 

(3.20) my (me + 1)? ny — (me + DOP (m + DO ni” ni" 


n' 2u+2) neti ) intl ) ? 
and 


(3 21) (no — 1)? ni 


< 2ne"*™* < 2ne’. 
gieth) 

Hence, for all n sufficiently large, 

(3.22) o (ry) < 2ne*. 


Now consider the expression (3.5) for? = wandi = 4» + 1. Passage from yu to 
u + 1 multiplies the first term of the right member of (3.5) by 
m— 2 — 2-1 
(3.23) (ni u)(m — 24 — 1) 
(n — Qu — 2)(n — Qu — 3) 
and the third term of the right member by 
(m — wu) 
(3.24) —__—_.. 
=~ — IP 
It is easy to see that for large but fixed u and all n greater than a lower bound 


which is a function of u only, the expression (3.23) is less than the expression 
(3.24). Hence, in view of (3.20), the sum of the first and third terms of the 















264 J. WOLFOWITZ 





right member of (3.5) for 7 = » + 1 is negative. Now consider what happens 
to the second term of the right member of (3.5) when 7 goes from yu to wp + 1. 
It is multiplied by 


(mi — pu) 
(3.25) ————_—-_. , 
@-a- 
which, also for large but fixed yu and all n larger than a lower bound which is a 
function of u only, is easily seen to be less than e. Consequently 


(3.26) @ (Ti+) < 2ne"*', 


It can be seen without difficulty that such a passage of (3.5) to the next higher 
index is always accompanied by multiplication by expressions similar to (3.23), 
(3.24), and (3.25), for which similar inequalities hold and that consequently 


(3.27) 0 < o(ri) < 2ne’, 
and for similar reasons 
0 < o(re;) < 2ne’, 


for all 7 not less than uw and for all n greater than a lower bound which is a fune- 
tion of u only (although it may be necessary to increase the original uv so that 
both the last two equations hold). The required result follows frem (3.19) 
and the convergence of the series (3.9). 

The proof of Theorem 4 follows along the same lines as that of Theorem 2. 

When f(x) = 1, F(V) = U(V), the statistic discussed in [2]. Other such 
results follow from specialization of f(x). Theorem 4 may also be generalized 
so that the elements v; which add up to mn are operated on by a function f, , 
while the elements v; which add up to m are operated on by another function 
fz, but this is easy to see and we do not go into the details. 





4. Tests of hypotheses in the non-parametric case. The great advances 
that have been made in mathematical statistics in recent years have been in 
two directions. On the one hand, the foundations of statistics, the theory of 
estimation and of testing hypotheses have been put on a rigorous basis of 
probability theory, and on the other, powerful methods for obtaining critical 
regions and confidence intervals and criteria for appraising their efficacy have 
been developed. Most of these developments have this feature in common, 
that the distribution functions of the various stochastic variables which enter 
into their problems are assumed to be of known functional form, and the theories 
of estimation and of testing hypotheses are theories of estimation of and of 
testing hypotheses about, one or more parameters, finite in number, the knowl- 
edge of which would completely determine the various distribution functions 
involved. We shall refer to this situation for brevity as the parametric case, 
and denote the opposite situation, where the functional forms of the distributions 
are unknown, as the non-parametric case. 








ADDITIVE PARTITION FUNCTIONS 265 


The literature of theoretical statistics, therefore, deals principally with the 
parametric case. The reasons for this are perhaps partly historic, and partly 
the fact that interesting results could more readily be expected to follow from 
the assumption of normality. Another reason is that, while the parametric 
case was for long developed on an intuitive basis, progress in the non-parametric 
case requires the use of modern notions. However, the needs of theoretical 
completeness and of practical research require the development of the theory 
of the non-parametric case. The purpose of the following section is to con- 
tribute to this theory. 

Brief mention of some of the literature may be made here. The problem of 
parametric estimation by confidence intervals, was put on a rigorous foundation 
by Neyman [4] and extended to the estimation of distribution functions in the 
non-parametric case by means of confidence belts by Wald and Wolfowitz 
[5]. Problems of testing non-parametric hypotheses have been treated in 
various places. The rank correlation coefficient has been used for a long time 
to test the independence of two variates. Its distribution was shown to be 
asymptotically normal by Hotelling and Pabst [6] and its small sample distribu- 
tion was discussed by Olds [7]. The problem of two samples has been dis- 
cussed, among others, by Thompson [8], Dixon [9] and Wald and Wolfowitz 
[2]. In 1937, Friedman [10] posed the non-parametric analogue of the problem 
in the analysis of variance and proposed a very ingenious solution. 

All these proposed solutions have this in common, that there exists no general 
principle which can be applied in each particular case to obtain a critical region, 
a role which is performed in the parametric case by Fisher’s principle of maxi- 
mum likelihood and the likelihood ratio criterion (Neyman and Pearson, [11]), 
whose validity, at least for large samples, has been established by Wald ((12], 
[13]). In each problem the solutions proposed have been intuitive and usually 
based on an analogy to the corresponding problem in the parametric case. Thus 
the principal justification for the use of the rank correlation coefficient is that 
its distribution is independent of the unknown distribution function (under 
the null hypothesis) and that its structure resembles that of the ordinary cor- 
relation coefficient. But any function of the order relations among the variates 
(ef. [2], p. 148) has a distribution which is independent of the unknown popula- 
tion distribution under the null hypothesis The same objection may be made 
to papers [8], [9], [10], [2], except that in [2], although the solution there proposed 
is an intuitive one, the criterion of consistency is extended from the parametric 
case to the non-parametric one. The fulfilment of this condition is a minimal 
requirement of a good test and on this basis the solution proposed in one of the 
previous papers cannot be considered a good one. 

In the following section we shall show that the likelihood ratio criterion may 
be extended to the non-parametric case where the test must be made on the 
order relations among the observations and that for a certain class of these 
problems which fulfill the same requirement as that for the application of the 
likelihood ratio criterion in the parametric case it would thus appear to furnish 


















266 





J. WOLFOWITZ 
a general method by which statistics may be obtained for a specific problem. 
We shall show this by applying it to the problem of two samples. This will 
serve to explain the method. Another problem will be discussed later. The 
ultimate justification of any statistic must be its power function, which ought 
therefore to constitute the next subject of investigation for these problems. 
Since for problems in the non-parametric case it is almost certain that uniformly 
most powerful tests do not exist, the question of determining the alternatives 
with respect to which proposed tests are powerful is particularly important. 


5. The problem of two samples. Let X and Y be two stochastic variables 
with the distribution functions f(x) and g(x), respectively. (The term distribu- 
tion function will always denote the cumulative distribution function. The 
letter P followed by an expression in braces will stand for the probability of the 
relation in braces. Hence P{X < x} = f(x) for all x.) f(x) and g(x) are 
assumed continuous. The m observations x21, 22 --- ,%n, and m2 observations 
Yi, Y2,°** 5 Yn. are made on X and Y respectively. The (null) hypothesis 
to be tested is that f(x) = g(x). The admissible alternatives are all continuous 
distribution functions f(7) and g(x) such that f(x) ¥ g(x). Then + m =n 
observations are arranged in ascending order of size, thus: Z = 2,-°-:,2z 
where 2, < z < --+ < z, (the probability that z; = 2:,:is 0). Let V = 1, 
ve, *** , Un be a sequence defined as follows: v; = 0 if z; is a member of the 
set 21, %2,°°*,2%n, and v; = 1 if 2; is a member of the set y1, yo, --°*, Yn - 
Then any statistic used to test the null hypothesis must be a function only of V 
({2], p. 148). 

We now apply the method of Neyman and Pearson [11] as follows: Q is the 
totality of all couples (di(2), ds(x)) of continuous distribution functions. The 
set w, a subset of Q, is the totality of all couples of distribution functions for which 
d, = d,. The sample space is the totality of all sequences V. The null hy- 
pothesis states that (f,g) is a member of w. The admissible alternatives are 
that (f,g) is a member of 2 not in w. The distribution of any function of V 
is the same for all members of w. Hence this essential requirement on the 
statistic to be selected for the application of the likelihood ratio criterion (cf. 
[11]) is satisfied by any statistic which is a function of V alone. Furthermore, 
all sequences V have the same probability if the null hypothesis is true ({2], 
p. 149). The numerator of the likelihood ratio is therefore a function only of ny; 
and n2, is the same for all V, and is therefore of no further interest. Hence 
T’(V), a function of V which is a monotonic function of the likelihood ratio 
for this problem, may be defined as the denominator of the likelihood ratio, 
as follows: Let P{V; (d; , d2)} be the probability of V when f = d,,andg = d2. 
Then 
T’(V) = max P{V; (di, de)}. 
Q 
The critical values of 7’(V) are the large values. However, we may use instead 
of T’(V) a convenient monotonic function of T’(V). 
















ADDITIVE PARTITION FUNCTIONS 267 


As an approximation to T’(V) we propose T(V), a statistic which is obtained 
on the assumption that for a given V a couple (d; , d2) which is essentially the 
same as that of the two sample distribution functions corresponding to the 
particular V approximates a couple which maximizes the right member of (5.1). 
(We say ‘‘a” couple because it cannot be unique.) This assumption seems a 
reasonable one, particularly for large samples. Only the form of (d;, d2) is 
assumed and the missing parameters are obtained in accordance with (5.1). 
Before describing the matter precisely, it must be stressed that this is offered 
only as a plausible approximation. For certain extreme V, for example, like 
those where zeros and ones nearly alternate, this is definitely not the maximizing 
couple. In spite of this the statistic T(V) assigns to these V values which are 
furthest removed from the critical region for any level of significance, as indeed 
any good statistic should. 

We first define a “‘run”’ as in [2], p. 149. A subsequence 0141) , U(e42) 5 * 
V(t4r) Of V (where r may also be 1) is called a “run” if 06441) = V4) = ++ * = Va4n 
and if v, ¥ v¢e41) when ¢ > 0 and if vag, ¥ Vesey) Whent+r<n. Let hj 
be the number of elements in the j* run of elements 0, and J; the number of 
elements in the j** run of elements 1. Suppose for a moment that the first 
element in V is a 0. Consider the following situation: There is an interval 
[a; , Qe], a1 < a2, on the line — 2 < x < + such that 


“ y 


Pia <X<a@}>0, P{a<Y <a} =0, 
P{X < m} = P{Y < a} = 0. 


This is followed by an interval [b; , bo], b: = a2, such that P{b) < X < bo} = 0, 


P{b) < Y < bo} > 0. This is in turn followed by an interval [a3 , a4], a3 = be , 
such that P{as < X < a,} > 0, P{as < Y < ay} = 0, ete. It is clear that the 
lengths and location of the intervals described are immaterial, provided only 
that they do not overlap. Also the distributions of X and Y within each 
interval are immaterial, provided only that they are continuous. All that 
matters for finding P{V; (dz , d3)} is that the number and the order of the dis- 
junct intervals shall be the same as those of the runs in JV, (i.e., intervals of 
positive probability for X must alternate with intervals of positive probability 
for Y, the number of intervals of positive probability for X and for Y must 
equal respectively the number of runs of the element 0 and the number of runs 
of the element 1, and the probability of the first interval on the left shall be 
positive for X or for ¥ according as the first run in V is of elements 0 or of ele- 
ments 1, with the same relation obtaining between the last interval on the right 
and the last run in V) and the probability of these intervals. Let P1; be the 
sought for probability of the interval which corresponds to the jth run of ele- 
ments 0 and P2; the probability of the interval which corresponds to the jth 
run of elements 1. In order to obtain V, it is necessary that the elements con- 
stituting each run shall fall into its corresponding interval. Then clearly by the 
multinomial theorem 


(5.2) P{V; (dt, dz)} = TT nit I] Gi)“ Pi) 
7 


i 











268 J. WOLFOWITZ 


where 7 = 1, 2 and where, when 7 is fixed, the product with respect to 7 is taken 
over all runs of the corresponding element. The right member of (5.2) is to be 
maximized with respect to the P;; , subject of course to the constraints 


(5.3) D Py = 1 (¢ = 1, 2). 
2 
Then it may easily be verified that the maximum occurs when 


i 

For, after multiplying by a constant and taking the logarithm we introduce two 
Lagrange multipliers 4; and yw. so that the maximizing P;; are given by the 
equations (5.3) and those obtained by equating to zero all the partial deriva- 
tives of 


Do dy (li; log Pig — ui Pii). 
i i 
The latter are therefore 
ss 
for all 7, whence (5.4) follows. It is easy to see that the extremum thus ob- 
tained is a maximum and also an absolute maximum. The sought-for statistic 
T(V) is then the right member of (5.2) after the results (5.4) have been inserted. 


It may be simplified by removing all factors which are functions only of m 
and ne (since these will then be the same for all V) and recalling that 


(5.5) De lis = 1 (¢ = 1, 2). 


(¢ = 1, 2), 


It will be convenient to take the logarithm of the resulting expression, so that 
with a slight change of notation we finally have 


(5.6) TmMmH=-2LL& 
where 
. [tii 
L;;! 


This result is immediately extensible to the problem of k samples and by way 
of summary we recapitulate it as follows: 

Let there be given k stochastic variables X,,--- , X, with the respective 
distribution functions fi(x), --- , f<(x), about which nothing is known except 
that they are continuous. Random independent observations, n; in number, 
are made on X; (¢ = 1,---,k). It is desired to test the hypothesis that 
fi = feo = +--+ =f, the admissible alternatives being all k-tuples of continuous 
distribution functions. The sequence V is obtained from the sequence Z by 


ADDITIVE PARTITION FUNCTIONS 269 


replacing an observation on X; by the element 7. Let 1;; be the number of 
elements in the jth run of elements 7. Then the corresponding statistic for 
testing the null hypothesis is T7,(V) or any monotonic function of it, where 


k 
TAV) = 2 didi 
a 
and 1;; is given by (5.7). The large values of 7,(V) are the critical values. 


Let r;; denote the number of runs of length j in the elements 7. Let 
pi rij =7;. Of course ), jr;; = n;. Also let s; = > rij. Then 
I i i 


(5.8) TAV) = DD jr 
and 
(5.9) TAV) = D0 js;. 


If a table were constructed of the numbers (5.7) from 1 to 50, say, or from 1 
to 100, this would cover most of the cases arising in practice. The calculation 
of T;.(V) by means of (5.9) would then be so simple that it could be performed 
very expeditiously by an ordinary clerk and with very much less labor than is 
required for most statistics in common use, like the correlation coefficient, for 
example. As a matter of interest we note that 


= 0 
= .693 
= 1.50 
= 2.37 


= 3.26 
and that 


(5.10) p<p 
where p is any integer > 1. (5.10) follows from the fact that 
p! > (VW 2ap — 1)p’e”. 


The distribution of 7(V) may be found for small samples by enumerating 
the sequences V, all of which have the same probability under the null hypothesis, 
and assigning to each V its T(V). The critical region consists of the V’s for 
which 7(V) takes the largest values, taken in sufficient number to make the 
critical region of proper size. It will not be necessary to enumerate all the 
V’s, since it is readily apparent that certain V’s can never belong to a critical 
region of any reasonable size. (Roughly speaking, a V with a large number of 
runs of short length will yield a small 7(V) and vice versa.) For large samples, 
the result of Section 3 is available, with f(z) = #. From (5.10) it follows 









270 J. WOLFOWITZ 









easily that the corresponding series (3.9) is convergent, so that T(V) is asymptot- 
ically normally distributed. It must be remembered when using tables of the 
normal distribution that the critical region of 7'(V) lies in only one “tail” of 
the normal curve. The greatest difficulty will occur for samples of moderate 
size. Methods like those of Olds [7] will probably help there. It is highly 
unlikely that any practicable formula which would give the exact distribution 
of T(V) exists. 

A few brief remarks may be made here on a related problem. Suppose we 
have observations from two bivariate populations about the distributions of 
both of which nothing is known except that they are continuous and it is sought 
to test whether the two populations have the same distribution functions. 
Suppose further that it were required that the statistic used for this purpose be 
invariant under any topologic transformation of the whole plane into itself. 
At this point we quote the following topologic theorem, the proof of which was 
communicated to the author by Dr. Herbert Robbins: Let x, y1 , %, yo, --*, 
Lp 5 Yp be any 2p distinct points in the plane. There exists a topologic transforma- 
tion of the whole plane into itself which takes x; into y; (¢ = 1,2,---,p). Asa 
consequence of this theorem we get the absurd result that the required statistic 
must be a constant. Hence this statistical problem can have no solution. 

As a matter of interest this statistical problem would have no solution even 
if it were not for the topologic theorem. The fact is that a continuous distribu- 
tion on a line remains continuous under a topologic transformation of the whole 
line into itself, but a continuous distribution in a k-dimensional (Euclidean) 
space (k > 1) may become discontinuous under a topologic transformation of 
the whole space into itself. (The probability distribution in the first space 
always determines a probability distribution in the transformed space, for 
probability functions are defined over all Borel sets of the space (ef. [15], p. 7) 
and a topologic transformation carries Borel sets into Borel sets (ef. [16], p. 195, 
Theorem II)). Consider the following example in the plane: A_ bivariate 
distribution function assigns probability 1 to a line L oblique to the coordinate 
axes, while any interval which contains no segment of the line L has probability 
0. On the line L the (one-dimensional) probability distribution may be ar- 
bitrary, provided it is continuous. The bivariate distribution function is 
without difficulty seen to be continuous. Now rotate the codrdinate axes until 
one of them is parallel to L. It is easy to see that after the rotation the bivariate 
distribution function is discontinuous. 

The question of whether a useful statistical problem could be obtained by 
properly delimiting the class of transformations which are to leave the statistic 
invariant and the solution of such a problem remain to be investigated. 




































6. The problem of the independence of several variates. This is an important 
practical problem and one of the earliest discussed in the literature (cf., for 
example, [6]). Let X; and X_ be stochastic variables with the joint (cumulative) 
distribution function F(x; , 22) which is known to be continuous in both variables 





ADDITIVE PARTITION FUNCTIONS 271 


jointly (i.e., F(a, v2) = P{Xi < a; X2 < x2}, where the right member is the 

probability of the occurrence of both the relations in braces). The marginal 

distributions fi(a1) and fo(xe) of X; and X2 respectively are defined as follows: 
fila) = P{X, < x1} = lim F(a, 22), 


z2-7 10 


folae) = P{X2 <a} = lim F(x, a). 


x4—>+00 

(It is easy to see that the continuity of F(x; , x2) implies the continuity of fi(2x,) 
and fo(x2).) 

The n random, independent pairs of observations x, 121, °** Zin, Vn are 
made on X; and X2. The null hypothesis states that 
(6.1) F(a, ; X2) = fi(ai) -fo( xe) 
i.e., that X,; and X» are independent. The alternative hypotheses are that 
F(a, , x2) does not satisfy (6.1).’ 

Let the set 211, Yi, 213, +++ , Xin be arranged in order of ascending size, thus: 
Z = 2, 2, 23,°°* ,2n Where 7 < 2% < +--+ < z,. The jth member of this 
sequence will be said to have the rank 7. In the same manner ranks are assigned 
to the x2; (j = 1,---,n). (It is easy to see that, since fi(a:) and fo(xe) are 
continuous, the probability that z; = 2,4: is 0 ete.) In the sequence Z the 
element z; (j = 1, --- ,”) is replaced by the rank of its associated observation 
on X2. We obtain a permutation of the integers 1, 2, --- , m which we denote 
by R. If in the procedure for obtaining R, we had reversed the roles of the 21; 
and x2;, we would have obtained the permutation R’. It is easy to see that 
any statistic, say .W’’, used to test the null hypothesis, must be a function only 
of R, with the added proviso that M’(R) = M’'(R’). (The rank correlation 
coefficient is such a statistic.) Under the null hypothesis all the R have the 


same probability (= ) ‘ 
n! 


The procedure of applying the likelihood ratio principle to this problem would 
then be as follows: © is the totality of all bivariate distribution functions 
H (2, , x2) which are continuous in both variables jointly. The respective mar- 
ginal distributions corresponding to H(x, x2) will be denotedby hy (x1) and he(x2). 
w is a subset of Q which consists of all H(x,, xe) for which H (a1, x2) = hi(a1) -he( x2). 
The sample space is the totality of all sequences R. The null hypothesis states 
that F(x, x2) is a member of w. The admissible alternatives are that F(2, 2x2) 
is a member of 2 not in w. The distribution of any function of R is the same 
for all members of w. Thus the essential requirement for the applicability of 
the likelihood ratio criterion is fulfilled. All sequences R have the same proba- 
bility for all members of w; hence the numerator of the likelihood ratio is a fune- 

1 It is easy to see that the independence or dependence of two stochastic variables is not 
a property which will remain invariant under a topologic transformation of the plane into 
itself. We therefore require of the statistic only that it be invariant under topologic trans- 
formation of each variable into itself, separately. 











272 J. WOLFOWITZ 


tion only of n which may therefore be ignored. We may then define M’(R), 
a monotonic function of the likelihood ratio as the denominator of the likeli- 
hood ratio, thus: 


(6.2) M'(R) = max P{R;H(a1, 2)} 
Q 


where P{R;H (x, , x2)} is the probability of R when H(x, , x2) is the joint distri- 
bution function of X,;and X.. The critical values of ’(R) are the large values. 
We now propose an approximation to M’(R) which we shall call M(R). We 
do this by describing a distribution function H*(2; , x2) for each R which seems 
a plausible approximation to a maximizing distribution function. It may be 
derived from certain assumptions about the nature of the maximizing distribu- 
tion function which we omit. The remarks made in the preceding section about 
the character of the approximation apply here as well. As before we specify 
only the form of the function and leave certain parameters, finite in number, 
to be determined in accordance with (6.2). (If the construction of H*(a , x2) 
should appear somewhat involved, this is due only to the analytic description. 
A sketch will show the essential simplicity of the situation.) We then have 


M(R) = P{R;H*(a , 22)}. 


Let R = a, @,---,a, be a given permutation of the integers 1 ton. A 
sub-sequence @i41) , @i42), °°* ,@a4n Will be called a run of length / if the 
following conditions are fulfilled: 

(6.3) The indices of the a’s are consecutive, 
(6.4) Ifl’ is any integer such that 1 < I’ < 1, then 


Qa) — Qa4¢rgy | = 1, 


(6.5) ift > 0, | a; — ag41) | > 1, 

(6.6) ifa +1 <n, | Qu4n — QG¢r4y | > 1. 

The run will be called an ascending run or a descending run according as 
Q¢41) — Ais) = —lor+t. Arun of length 1 is of either type, at pleasure. 
For example, let 


R = 5,6, 1, 4, 3, 2. 


The first run is 5, 6, the second 1, the last 4, 3, 2. 5, 6 is an ascending run of 
length two, 4, 3, 2a descending run of length three, and 1 a run of length one. 

H*(x; , 22) is a degenerate distribution function such that the relation between 
X, and X2 is functional (this is a special case of stochastic relationship). That 
is to say, X2 = v( Xi), where ¢(X)) is a single-valued function defined for all the 
possible values of X,, with a single-valued inverse ¢ '(X2) defined for all possible 
values of X.. Hence H*(x, x2) is completely specified when the function 
Xo = o(X;) and ht (2x1) the marginal distribution function of X,, are given 
(hy (21) must of course be continuous). 

Consider a system of intervals on the line — « < 2, < + of which (7 — 1, 2) 





ADDITIVE PARTITION FUNCTIONS 273 


is the 7th, 7 = 1, 2, --- , mn and a similar system on the line —* < 2 < +”. 
(Actually, as in the previous section, neither the length of the intervals nor 
their location is material. The intervals need merely be disjunct and in a certain 
order. We are using these particular intervals to simplify the notation.) Let 
l, be the length of the first run. a; is its first element. Then let 


m= Pj0< Xi < hs hi(a)} 


. . ~ > * 
be one of the as yet undetermined parameters. We now partly define h; (2) 
as follows: 


hi (21) = 
(6.7) hi(a1) = 1, mu>n 
hi (hy) = ie 


7? . . * . . . ° 
Within the interval (0,1), hy (a1) may be any continuous monotonic increasing 
function which satisfies (6.7). We partly define g(X;) as follows: 

If the first run is ascending, let 


(6.8) (0) =a —1 
(6.9) (21) = a — 1 + Ks 0 x. vy < iF ° 


| 
So 
lA 
S 


1 


If the first run is descending, let 
(6.10) g(0) =a 
(6.11) (a1) = @— ti; 0 < w% < Li ° 


We proceed in this manner through all the runs of R. Let 1; be the length of 
the 7th run. Let A; = Zale . The first element of the jth run is aq,41). Let 


t<j 


pj = PIs < Xi SA +13 h(a}, 


be another of the as yet undetermined parameters. We then define ht (x) as 
follows: 


(6.12) hia) = Do pi 
i<j 
(6.13) ms +1) = 2 pi 


Within the interval (A; , A; + 1,), hi (a1) may be any continuous monotonic in- 
creasing function which satisfies (6.12) and (6.13). We define ¢(X;) as follows: 
If the jth run is ascending, let 


(6.14) : g(a) = @0a;+1) — 1+ 1 (A; <x < Aj + l;). 
If the jth run is descending, let 
(6.15) (a1) = Qa; +1) — (A; < th < Aj + l;). 


If 1; = 1, the run may be considered ascending or descending at pleasure. 














274 J. WOLFOWITZ 


In order to obtain R, it is necessary that all the elements of a run shall fall 
into its corresponding interval. Then it is easy to see that by the multinomial 
theorem 


(6.16) P{R;H*(x1, x2)} = nt TT i) pe. 


The right member of (6.16) is to be maximized with respect to the p; subject to 
the constraint 


(6.17) Zp: = 1. 
It is easy to verify that the maximum occurs when 


l. 
6.18) ;=-. 
( a= 
M(R) is the right member of (6.16) after the results (6.18) have been inserted. 
It is convenient to remove all factors which are functions only of n and to take 
the logarithm of the resulting expression. Then with a slight change of nota- 
tion we may say that 


(6.19) M(R) = Dol: 
where 

Ly 
(6.20) 1, on to ). 


The critical values of 47(R) are the large values. One may verify without much 
difficulty that W(R) = M(R’), i.e., that the statistic is symmetric with respect 
to X, and Xo as indeed it should be. 

This result is immediately extensible to the problem of testing whether k 
stochastic variables X,, --- , X;, are independent. We shall not go into the 
details, which are similar to those described above, and content ourselves with 
giving the definition of a run for the case k = 3. After the observations on X, 
have been arranged in ascending order, we obtain two sequences Rez and R; , 
the associated ranks of the observations on X2and X3;. Let Ro = bi, bo, --- by 
and R; = .. be, ---,b,. The ascending sequence of consecutive integers 
(¢ + 1), (@ + 2),---,(¢ + lL) determines a run of length / if the sequences 
boas , Dasa), +++ y bey and bien . a g MPPs hc both satisfy (6.4), and if at 
least one of the sequences satisfies (6.5), and at least one, but not necessarily 
the same one, satisfies (6.6). The adjectives ascending and descending apply 
to each sequence separately. 

Let r; be the number of runs of length 7 in R. Then 


(6.21) M(R) = DoF. 


Most of the remarks made in Section 5 about the small sample distribution of 
T(V) are also applicable to the distribution of M(R). More will be said in the 











all 
ial 


to 


Db. 





ADDITIVE PARTITION FUNCTIONS 275 








next section about the distribution of M(R) which involves the solution of a 
combinatorial problem not discussed in the literature. 


7. On the distribution of WW(R). While most of the remarks made about the 
small sample distribution of 7(V) apply to the question of the distribution of 
M(R) in small samples, the situation with respect to the distribution of M(R) 
in samples of medium size and large size is very different and, in certain respects, 
is more favorable for practical application than is the case with T(V). It would 
be reasonable to expect, for example, in view of Section 3 and of the structure 
of the statistic 1/(R) that the asymptotic distribution of M(R) should be normal. 
Surprisingly enough, this is not the case. It is not even continuous. In order 
to clarify the situation, we begin with a few necessary ideas and definitions. 

Let the stochastic variable W(R) be defined as the total number, in R, of 
runs of the sense of Section 6. We shall be interested in the distribution of W(R). 
The number n of the pairs of observations on X; and X2 (we consider the case of 
two variates) will be assumed arbitrary but fixed throughout the discussion and 
will not be exhibited. Let N(k) be the number of sequences R (of the integers 
1 to n) which contain exactly k runs. 

Consider, for example, for the case n = 6, the sequence 234651. We 
shall say that this sequence contains the “contacts” (2, 3), (3, 4), (6, 5). In 
general, a contact is defined as the juxtaposition, in the sequence R, of consecu- 
tive numbers, whether in ascending or descending order. If k is the number of 
runs and / the number of contacts in a sequence R, then obviously 


(7.1) k+l=n. 


Let Ro be the sequence 1, 2, --- ,n of the first n integers in ascending order. 
The n — 1 contacts of this sequence may themselves be arranged in a sequence 
R* of contacts, thus: 


(1, 2), (2, 3), i (n oo 1, n). 


Suppose / of the contacts which constitute the sequence R* are selected in some 
manner to form the set O. The remaining n — 1 — 1 contacts form the comple- 
mentary set O’. After this selection the sequence R* may be considered a 
sequence of the type of the sequences V of Section 5 with the members of O 
playing the role of the elements 0 and the members of O’ playing the role of the 
elements 1. When R* is considered in this manner we will write it as R*(O). 
The definition of a run of Section 5 as applied to sequences V is now applicable 
to R*(O). We will call any such run of the members of O or of O’ a group. 
We wish first to answer the following question: In how many ways can the 
set O be selected from among the elements of R* so that it will contain 1 mem- 
bers arranged in R*(O) in i groups? If, for a given O, 2’ be the number of 
groups into which OQ’ is divided in R*(O), it is clear that ¢ — 7’ can equal only 
—1, 0, or +1. Hence only four situations can arise, as follows: 
a) 7” =i+ 1. The first group in R*(O) is therefore composed of elements of 











276 J. WOLFOWITZ 


O’. The number of ways in which | elements can be divided into 7 runs of the 
m . — , a “ 
type of Section 2 is the coefficient of x in the purely formal expansion of 


(x + xr + x + oe)? a ( =.) 


1-—2z 


‘ 1-1 is ie ‘ 
and is therefore (! i 2 Similarly n — 1 — 1 elements can be divided into 


n—Il—2 
a 


’ = 4 + 1 runs in ( 


( 1 (~ — l -- ) 
; wavs. 
i ; 


b)? = — i. a a similar argument as above, this can occur in 


[ 1\ /n — :. — om 
i Sm " Ways. 


c) 2’ = zand the first group is made up of elements from O. This will occur in 


[— n-l(—2 ai 
i-1 i-1 ways. 


d) 2’ = 7 and the first group is made up of elements from O’. This will also 


occur in t—] ots ways 
i-1 i-1 oer 


The set O which contains | elements arranged in 7 groups can therefore be 
selected in 


ca CINCH) C75 eT) 


ways, and the quantity (7.2) is, by elementary combinatorics, equal to 


(7.3) ({ - ) r P > 


Let any set O of | contacts divided into 7 groups be selected from R*. Imagine 
that each contact in O sets up, in Ry , an unbreakable bond which links the two 
elements involved in the contact, but no contact in O’ creates such a bond. 
Given these bonds set up by O, we seek the number of different sequences into 
which the n elements of Ry can be permuted while respecting these bonds. 
Since there are / bonds, we can actually manipulate only n — I entities, except 
that two elements linked by a bond may have their order reversed; for example, 
if O contains (1, 2), 1 may either precede or follow 2 and the bond would still 
be respected. However, if one contact in a group is reversed, the group asa 
whole must be reversed, else a bond would be broken. Hence the number of 
distinct sequences into which Ry may be permuted while all the bonds set up 
by O are respected is 2'(n — J)!. 

Let us refer to the sequences thus obtained as the family generated by O. 
All the sequences in a family are distinct. Now let O range over all sets of l 


) ways. Hence this situation will arise in 


| 
_ 


~ 


~ 
—" 





ADDITIVE PARTITION FUNCTIONS 277 





the 


ito 


in 


so 


be 





contacts selected from R*. The various families obtained will not be disjunct, 
but some will have sequences in common. In spite of this, we seek the total 
of the number of sequences in all the families. The total of the number of 
sequences in all the families generated by sets of | contacts divided into 7 groups 
is, by (7.3) and the result of the preceding paragraph, 


(7.4) 2'({ a : (” : ; (n — Dl. 


Sets of 1 contacts may consist of 1, 2, --- / groups, so that the total number of 
sequences in all the families generated by sets of | contacts is 


i : if/l-—1 n—l ' 
(7.5) Ai = 202 Ca ; Jin! 


where | may take the values 1, 2, --- ,(m — 1). The conventions on the combi- 
natorial symbols will be: 


Define Ap as 


The following equation is trivial: 
(7.7) Ao = 2» N(i). 


We now consider all the families generated by sets O which contain exactly 1 
contacts. As was said before, the total of the number of sequences in each is A;. 
Let H(l) be the set of all the sequences in all these families, with each sequence 
in H(l) counted as many times as the number of families in which it occurs. 
Every sequence in H(l) has the | contacts of the set O which generated it, but 
after permuting Ry other contacts may still exist. Hence every sequence in 
H(l) has at least 1 contacts and therefore by (7.1), at most m — lruns. Clearly, 
a sequence which has exactly / contacts occurs exactly once in H(l), since it 
can appear only in the family generated by the set O of its / contacts and in no 
other family. A sequence which has exactly (J + 1) contacts will appear 


exactly (' 7 ') times in H(l), for it will appear once in each family generated by 


i+ 1 
l 
its (1 + 1) contacts, and in no other family. Similarly each sequence which has 


exactly (1 + 2) contacts will appear in H(l) ( " " times, and so forth. We 


a set O which consists of one of the ( ) selections of | contacts from among 


therefore obtain, in view of (7.1), 





278 J. WOLFOWITZ 


(7.8) A, = : (;) N(n — 1) (2 = 1,2,---,(m—1)). 


i=l 


The system of n linear equations (7.7) and (7.8) completely determines the 


quantities N(1), N(2), ---,N(n). The matrix of these equations has a deter- 
minant whose absolute value is 1, so that the quantities N(1), N(2), --- , N(n) 


may readily be expressed in determinantal form. Furthermore the moments 
of W(R) are readily found from these equations. Thus from (7.8) for 1 = 1 
we find 

(7.9) E(W(R)) = 


~n—2 





n — 2n + 2 
n 


and from (7.8) for 1 = 2 and / = 1 we find, after a little obvious manipulation, 


3 2 2 
(7.10) W(R)) = 2 _— 88 + Ont 4} 






Higher moments of W(R) may be found in similar manner. 

Since the limiting variance of W(R) is 2 it follows that the asymptotic distri- 
bution is not continuous. For n of any size the bulk of the values are concen- 
trated in a short interval ending at n. When W(R) = n, M(R) = O, when 
W(R) = n — 1, M(R) = log 2, and when W(R) = n — 2, M(R) = log 43 or 
log 4. It is easy to see that for the values of W(R) which differ very little 
from n there are only a small number of values of ./(R), whose asymptotic 
distribution is also discontinuous. The statistic W(R) is therefore a good 
approximation to the statistic M@(R) for the purposes of tests of significance 
(for M(R) the large values are the critical values and for W(R) the small values 
are critical), and has a few additional practical advantages. It is even easier 
to compute than M(R); the computation is best performed by counting con- 
tacts. Since the limiting variance is a small constant, it follows that many 
tests of significance can be performed simply by use of Tchebycheff’s inequality. 
For example, suppose a given large sample contains 9 contacts, i.e., n — 9 
runs (we say a “large” sample in order to use the simple limiting mean and 
variance; if desired or for a small sample these latter may be computed exactly 
by (7.9) and (7.10)). Then by Tchebycheff’s inequality it follows that the 
probability of obtaining n — 9 or fewer runs is less than .041. Thus the presence 
of 9 contacts would be sufficient to render a sample of great size significant on a 
5% level. For the few numbers of contacts about which doubt will exist as to 
whether or not they are critical values two procedures are possible. Either the 
equations (7.7) and (7.8) may be solved exactly for the doubtful values, or 
several higher moments may be found from (7.8) and the methods of Wald [14] 
can be applied to delimit the missing probabilities to any accuracy desired. By 
enumerating the few values of /(R) which correspond to several of the largest 
values of W(R) the distribution of 1/(R) may be computed sufficiently to serve 
the purposes of tests of significance. 



























CY 


—_ fF —_ iy 














ADDITIVE PARTITION FUNCTIONS 


REFERENCES 


[1] Maurice Frecuet, Généralités sur les Probabilités, Variables Aleatoires, Paris (1937). 
[2] A. Wap and J. WotrowiTz, Annals of Math. Stat., Vol. 11 (1940), p. 147. 
[3] A. M. Moon, Annals of Math. Stat., Vol. 11 (1940), p. 367. 
[4] J. Nevman, Phil. Trans. Roy. Soc. London, Vol. 231 (1937), pp. 333-380. 
[5] A. Waup and J. Wo.row1tTz, Annals of Math. Stat., Vol. 10 (June, 1939), p. 105. 
[6] H. Hore.uine and M. Passt, Annals of Math. Stat., Vol. 7 (1936), p. 29. 
[7] E. G. Outps, Annals of Math. Stat., Vol. 9 (1938), p. 133. 
[8] W. R. THomrson, Annals of Math. Stat., Vol. 9 (1938), p. 281. 
[9] W. J. Drxon, Annals of Math. Stat., Vol. 11 (June, 1940), p. 199. 
[10] Mitton FriepMAN, Jour. Amer. Stat. Assoc., Vol. 32 (1937), p. 675. 
[11] J. NeEyman and E. Pearson, Trans. Royal Soc., A., Vol. 231 (1933), p. 295. 
[12] A. Wap, Bull. Amer. Math. Soc., Vol. 46 (1940), p. 235. 
[13] A. Waxp, Bull. Amer. Math. Soc., Vol. 47 (1941), p. 396. 
| (14) A. Wap, Trans. Amer. Math. Soc., Vol. 46 (1939), p. 280. 
[15] Haratp Cramer, Random Variables and Probability Distributions, Cambridge (1937). 
[16] F. Hausporrr, Mengenlehre (Second Edition), Berlin and Leipzig, 1927. 






















ON THE THEORY OF TESTING COMPOSITE HYPOTHESES 
WITH ONE CONSTRAINT 


By Henry ScHEFFE 
Princeton University 


1. Introduction. Our purpose is to extend some of the Neyman-Pearson 
theory of testing hypotheses to cover certain cases of frequent interest which are 
complicated by the presence of nuisance parameters. Our results give methods 
of finding critical regions of tvpes B and B,. Type B regions were defined by 
Neyman [1] for the case of one nuisance parameter. Type B, regions are the 
natural generalization of the type A; regions of Neyman and Pearson [5] to 
permit the occurrence of nuisance parameters. The reader familiar with the 
work of these authors will recognize most of the notation and some of the 
methods. 

We consider a joint distribution of n random variables 21, 22, +--+: ,2n, 
depending on / parameters 6, 6,-°--,6:, 1 S n. The functional form of 
the distribution is given. The random variables may be regarded as the co- 
ordinates of a point F in an n-dimensional sample space W, the parameters, 
as the coordinates of a point 9 in an /-dimensional space Q of admissible param- 
eter values. Q, unlike W, in general will not be a complete Euclidian space. 
Let w denote the subspace of 2 defined by 6; = 6; . The hypothesis we consider is 


Hy : Oew. 


Neyman and Pearson [4] call Hp a hypothesis with / — 1 degrees of freedom; 
for our present purpose we shift the emphasis by saying it has one constraint. 

It is clear that whenever we test whether a parameter has a given value, and 
other parameters occur in the distribution, we are testing a hypothesis with one 
constraint. Hypotheses of the tvpe 6: = 62, in which we do not specify the 
common value of 6; and @ , nor the values of any other parameters, may always 
be transformed to Hy by choosing new parameters. In general, the hypothesis 
that the parameter point 6 lies on some hypersurface in Q, g(@ , 62, °°: , 6.) = 
go, may be transformed to Hp if the function g satisfies certain conditions,— 
say, g is continuous and monotone-increasing in one of the @’s for all © in Q. 
Another circumstance lending importance to the theory of testing hypotheses 
with one constraint is its connection with the theory of confidence intervals, 
which we shall point out below. 

The path which led Neyman to critical regions of type B is the following: 
Every Borel-measurable region w of sample space determines a test of Ho, 
which consists of rejecting Ho if and only if E falls in w. In deciding which 
is a most efficient test, one may limit the competition to similar’ regions, if 
such exist. Because of the general non-existence [2, p. 372] of uniformly most 






1 Defined by condition (a) of definition 1. 


280 


ON THEORY OF TESTING COMPOSITE HYPOTHESES 281 





n; 
it. 
id 
ne 
he 





powerful tests, one is led to consider common best critical regions [4] if he is 
interested only in alternatives @, < 6; (or 6: > 6:), or else regions giving an 
unbiased test [l, p. 251]. Narrowing the competition further to the latter 
class of regions, one is led to regions of type B if he seeks tests which are most 
powerful for 6; very near to 6; , and to type B, regions if he is not content with 
this. These types of regions are defined in section 2. 

We may now state the relationship of hypotheses with one constraint to the 
theory of confidence intervals [2]. To find confidence intervals for 6; , we must 
first find similar regions w(6}) for testing Hy. If with every admissible 6, 
we can associate a w(6,), then confidence regions for 6; are determined, and if 
these be intervals, they are confidence intervals. Every class of similar regions 
mentioned above is intimately related to a category of confidence intervals. 
In particular, to find Neyman’s short unbiased confidence intervals we must 
first solve the problem of type B regions. Likewise, if we define shortest un- 
biased confidence intervals in the obvious way along the lines laid down by 
Neyman, their discovery rests on the solution of the problem of tvpe B; regions. 

While the assumptions of section 3, especially 3’, are unpleasantly restrictive— 
they are obviously tailored to fit the proof rather than the problem—they are 
nevertheless satisfied in many sampling problems associated with normal 
distributions. An application of the theorems of section 4 will be given in 
another paper On the ratio of the variances of two normal populations. The present 
theory was needed to round out that paper and was originally planned as a 
section thereof. However, it seems desirable for the convenience of other 
workers who might have use for the theory not to bury it under the preceding 
title. 

Section 5 consists of an appendix on the moment problem raised by assump- 
tion 5’. 


2. Definitions. The symbols w, wo, w; will always be understood to denote 
Borel-measurable regions in W. We shall symbolize d'Pr{E ew | 0}/86; for 
i = 0, 1, 2 by P(w| 9), P’(w| 0), P’’(w| 9), respectively. Since 6, plays a 
distinguished rdle, it will often be convenient to write 0 = (@,, 8), where the 
nuisance parameters are denoted by & = (6, 63, °° , 4). 

DEFINITION 1: Wo ts said to be a type B region for testing Ho if for all 8 in w 

(a) P(awo | 6: , #) = a, where a is independent of 9, 

(b) P’(wo | 61, 8), P’’(wo | 6: , &) exist, 

(c) P’(wo| 6, 8) = 0, 

(d) P’(wo| 6, 8) = P’(w, | 6, 8) for all w; satisfying (a), (b), (ec). 

DEFINITION 2: wo is said to be of type B, if the conditions (a), (b’), (ce), (d’) 
are satisfied. The conditions (a), (c) are given in definition 1, the other two are 

(b’) P’(wo | 0, 8) is continuous in 6 at 0 = 6; for all O in w, 

(d’) P(wo| 6, 8) = P(wi| 1, 8) for all w; satisfying (a), (b’), (c), and all 
8 in Q. 

















282 HENRY SCHEFFE 





3. Assumptions. p(z1, 22,---,2m|90) will be a generic notation for the 
p.d.f. (probability density function) of random variables 2; , 22, --- , 2m Whose 
distribution depends on 6. The numbering of the following assumptions follows 
that of Nevman elsewhere [1]. 

1°. (a) There exists a p.d.f. p(E | @) such that for any w, and any 0 Q, 


(1) P(w!@) = / p(E | @)dW 


where dW denotes the volume element dada. --+- dz, . 

(b) The region W, in W defined by p(E | 8) > 0 is independent of © for 
0 €w. 

(c) The connectivity of w is such that it is possible to pass from any point 
0’ in w to any other point 6” in w by a path lying entirely in w and consisting 
of a finite number of segments on each of which all but one of 0, 63, --+ , 4 
are constant. 

2’. For all E ¢ W, and 0 ew, p(E | ®) is differentiable twice with respect to 0, 
and indefinitely with respect to @:, 63,--:,6:. For any w, and any 0 €w, 
the corresponding derivatives of P(w | 0) exist and may be obtained by differen- 
tiating under the integral sign in (1). 


We now define 


%: = 0 log p(E | 8)/08; , ¢:; = 06;/00;, i,7 =1,2,--:,1. 


3°. For all E e W, and 0 ew, ¢; = ¢;(E, 8) is continuous in E, i = 1, 2, --- , l, 
and 


l 
(2) $i = Ay + Li Binge, i,j = 2,3,---,1, 
k=2 
l 
(3) ga = Aa + Zz Bik be , $= 1,2,---,0 
k=1 
where A;; = A;,(0, 0), Bij. = Bijx(6}, 8) are continuous in each of 


02,03, °°: , A. 

4°. The matrix (d9¢;/dx;), 7 = 1, 2,---,1l; j = 1, 2,---,m, contains an 
1 X Il minor which is non-singular’ for all E ¢ Ws and © €w, and whose elements 
are continuous in £. 

Write ® = (go ,¢3, --- , 7), and denote by p(¢: , ® | w, 9) the p.d-f. of (¢: , ®) 
calculated under the assumption that E ew, ie., that the p.df. of E is 
p(E | 8)/P(w | 8) for EF ew and zero for Ee W — w. Define 








2 If for each 0 €w, 4° is violated on an exceptional set U(®) for which P(U(@) | 6) = 0, 
the theorems 1 and 2 may still be valid. What is essential is the existence of the p.d.f. 
p(bi , d2, --: ,¢2| 9) for all 9 ew. On reconsidering the theorems and their proofs, the 
reader will see that if the set U(®) is deleted from W. , then 1°(b) may be violated, but not 
seriously, and no essential changes are necessary. The addition of the necessary quali- 
fying clauses to our statements, regarding sets of probability zero, would encumber the 
developments. 














the 
jose 


OWS 


for 


int 
ing 
’ 0, 


) A 
EW, 
en- 


af. 
the 
not 
ali- 
the 





ON THEORY OF TESTING COMPOSITE HYPOTHESES 283 


+20 


(4) Q.(&|w, 6) = $1 P(di, ®| w, O) dor. 


Let w: be any region satisfving condition (a) of definition 1: 
y ° *¢ 3 4 ; | 
5’. We assume, for each 0 ew, that if the moments’ of Q,(@| w,, 0) and 
Q.(@ | W, 0) are the same then these functions are equal for almost all ® 


(a) fors = 0, 
(b) for s = 1. 
Note that Qo is p.d.f., Q is not. 


4. Theorems. A result of Nevman’s [1] for / = 2 is generalized in the fol- 
lowing* 

TuEorEM 1: Under the assumptions 1° to 5’, consider the existence of functions 
k(@, 6), 3), 7 = 1, 2, such that ky < ke and 


ko (®,09,9) | 
(5) | $i (dr, ©1602, 9) ddr 


"1 (#69, 9) 


=(1—a)[  diplr,%| 6,9) dh, 8 = 0,1, 
for all ® = ($2,¢3, ++: ,.). If such functions exist for some 0 = 0’ €w, they 
exist for all 0 €w. Then the region wo in W defined by 


(6) oi(E, 6, 3) < ki(, 0,8) and oi(E, 6), 8%) > kolo, Of, 8) 


is independent of 3, and is a region of type B for testing the hypothesis Ho . 

Since throughout the proof 6 = (6), 8), we shall write © in place of these 
svmbols to simplify the printing. It is to be understood that every statement 
in the proof involving the symbol 0 is asserted for all 8 in w. 

We suppose first that a type B region wy exists in W,. Then from (a), (c) 
of definition 1 and assumptions 1°(a) and 2°, 


I 


a, 


(7) | p(E |e) dW 


0. 


I 


(8) / oi p(E 8) dW 


Since the value of the integral (7) is independent of #, all its derivatives with 
respect to 62, 63, °°: , 6, must vanish. This leads [3, pp. 50, 51. Insert k, 
before @, | in (15)] to 


3 By this term we include ‘‘product moments.”’ 

4When I communicated this theorem to Professor Neyman, he informed me it was 
among the results of a thesis by R. Sat6, Contributions to the theory of testing statistical 
composite hypotheses, University of London, 1937, and he kindly sent me a copy of the MS. 
I decided nevertheless to publish my version of theorem and proof, since for the reasons 
indicated in section 1 this theory should be available in the literature. 






284 


(9) af 


where M is independent of wo, and thus has the value obtained from (9) by 
putting w = W and a 1. In particular, 





HENRY SCHEFFE 





l 
I] oi p(Z\0) dW = M(kn, ks, --- , k\0), k; = 0,1,2, +++, 


v9 i=2 












(10) 


R 


a7 di pP(E e)dW =0, i=2,3,---, 1. 


The necessary condition (9) for (7) is also sufficient. Denoting by &(f|w, 0) 
the expected value of a function f(/, 90) calculated under the assumption that 
E ew, equation (9) may be written 


(11) & (II o:'| wo, 0) = & (II ¢;' | W, 0). 


i=. 


From assumption 5'(a) it then follows that 
(12) Qo(® | wo, 8) = Qo | W, 9) 


for almost all @. Conversely, (12) implies (11). 
In a similar manner we get from (8) with the aid of (9), 


l l 
(13) & (0: II oi! | wo, 0) =& (< Il ¢3' |W, 0). 


We calculate the moments of the function Q:(@ | w, 9) to be 


l 
& (<: II ¢%' | w, 0) ' 


=e 


and hence because of 5'(b), (13) implies 


(14) Qi | wo, 8) = Que | W, 0) 









almost everywhere in the @-space. The pair of conditions (12), (14) are equiva- 
lent to the pair (7), (8). 

In order that wo be a type B region, it is necessary and sufficient that it satisfy 
(12) and (14) and that 


P''(wo | 8) 2 P’’(w, | 8) 


for all w; satisfying (12) and (14). The inequality may be transformed with 
the help of 1°(a), 2”, (3), (7), (8), and (10) to 


J 


which is equivalent to 


¢1 p(E |e) dW >| ¢; p(E|\ 0) dW, 


0 Ww} 


[ si / $1 P(di, P| wo, O) doi dds +++ ddr 


> | vf $1 P(di, ®| w,, O) doi doy --- 


vp, O) 
that 


ON THEORY OF TESTING COMPOSITE HYPOTHESES 


Sufficient for this is 
(15) Q.(@ | wo, 8) 2 Q(| wi, 9). 


We note the functions in (12), (14), and (15) are all of the form (4) with 
s = 0, 1, 2, and propose to transform these to integrals over certain portions 
of the sample space W. First, we write (4) in the form 


(16) Qo(#| w, e) | 1 P(di| &, w, 8) dor = Qo(H| w, O)G(Gi | #, w, 8). 


Next, we consider “surfaces’’ S(®, 8) in W,, constructed as follows: For 
any fixed 0 let D(®) be the / — 1 dimensional domain of values of ¢;(F, 8), 
i = 2,3,---,l, for Re W,. A “surface” S(#, 0) is the locus of points E 
for which 


(17) ¢:(E, 0) = ¢;, a constant, iu RSE +-.f 


the set of constants being in D(®). Over every “surface” we now define a 
density p: Without loss of generality, and to simplify the notation, we shall 
assume that the non-singular minor postulated in 4° contains the minor (4¢;/dz )), 
i= 2,3,---,l;73 = 1, 2,---,1 — 1, and denote by J(E, @) its determinant. 
For E on S(#, 8) we define the density 


(18) p(E | 0) = p(E | @)/ | J(E, 8) |, 


and consider ‘“‘surface”’ integrals 


(19) | P10 Citadis +- da, 
wS(,0) 


where 
(20) FE, 0) = oi(E, ®)p(E | 8). 


A “surface” integral (19) is to be distinguished from an ordinary multiple in- 
tegral, in that the integrand is not merely a function of 2, , %41,°°+,2n} 
there may be several points E on the surface with the same values for these 
coordinates, but different values for the integrand. The integral is to be 
thought of as follows: The part wS(#, @) of the “‘surface’”’ S(#, 6) is partitioned 
into pieces AS, on each a point E is chosen, and the value of the integrand at E 
is multiplied by the “area’’ of the projection (taken non-negative) of AS on the 
%,,Vi41,°°*, X,-space. The “surface” integral is the limit of the sum of 
such products as the norm of the partition approaches zero. 

Denoting the integral (19) by J(s) for the moment, we may calculate that 


for @ e D(@) 
I(s) = I(0)&(¢; |, w, 9), (0) = Q(#| w, 8)P(w | 9), 


and hence we see that the right member of (16) is equal to the integral (19) 
divided by P(w| 0). The desired relationship between the ordinary integrals 
(4) and the “‘surface”’ integrals (19) is thus 

















286 HENRY SCHEFFE 


(21) Q.(®|w, 0) = FAE, 9) ll dx;/P(w|@). 
j=l 


wS(®,0) 


The conditions (12), (14), (15) may now be written 


(22) / F,(E, 0) [[ dz; = a [ F.(E,°)[[ dz;, s = 0,1, 
woS(*,0) j=l S(,0) j=l 

(23) [ Fee) Tar, 2 [_. F@, 0) Lae, 
woS(%,0) j=l w)S(%,0) j=l 


if ® is in the domain D(98), else they are satisfied trivially. wp» will be a type B 
region if equations (22) are satisfied for almost all @ e D(@), and if (23) is valid 
for all w; satisfying (22). 

We now hold 60 fixed in w and @ fixed in D(9), so that S(#, 0) is fixed, and the 
right members of equations (22) have constant values. The proof [5, p. 11] 
of the lemma of Neyman and Pearson giving sufficient conditions that a region 
maximize an integral, subject to integral side-conditions, is easily seen to be 
valid for our ‘‘surface’’ integrals, and a sufficient condition that woS(®, 0) 
have the desired property is then that it be defined by 


(24) oi(E, 8) > ao + agi(E, 9), 


where dp , a are independent of HE on S(#, 9), and are such that equations (22) 
are satisfied. Since 6 and @ are fixed, we may permit a; to be of the nature 
a; = a,(*, 8), 7 = 1, 2. Introducing functions ki < ke, ki = k(, 90), and 
defining a, a; from 


a = —kyke , qj =k + ke, 


the inequality (24) is satisfied if (6) is. Still holding © fixed, suppose that 
ky , ke ean be determined for all @ (hence almost all @) in D(®@) so that for the 
part woS(*, 8) of S(*, 0), defined by (6), the equations (22) are satisfied. The 
parts woS(®, 98) of “surfaces” then sweep out a “solid”? wo(@) in W, , defined 
by (6). If we can similarly determine k; and ke , and hence wo(9), for every 9 
in w, and if furthermore wo(9) is independent of 9, then it is the type B region 
we seek. 

The equations (22) have now served their main purpose, and we return to 
their equivalents, (12) and (14). For w(®) defined by (6) 


P(di ’ ® | Wo ; 0) = p(dr ’ ® | W, 0) /a if $1 < ky or 9 > ke ’ 


and vanishes otherwise, and hence equations (12) and (14) are equivalent to (5). 

The remainder of the proof consists of deducing that k; , ke exist, and that the 
associated region wo(@) is independent of 0, for all © ew, from the hypothesis 
of our theorem that k; , kz exist for some 8 = 0’. By 1°(c), 0’ lies on a line 
segment L entirely in w, on which all but one of the nuisance parameters, say 
62, are constant. Let us vary 6 over L. Then @,;, 6;,--- , 6; remain fixed 
and @ varies over an interval J. The equations (2) for 7 = 2 now become 








12) 
ire 


nd 


ON THEORY OF TESTING COMPOSITE HYPOTHESES 287 


ordinary differential equations in which the independent variable is 6, the 
dependent variables are ¢:, ¢:, °°: ,¢.:, and 6: , 03, --- , 0 are parameters. 
A well known existence theorem assures us of the existence of particular solu- 
tions u; and a non-singular (for all 6 in J) matrix (u;;) of complementary solu- 
tions, 7,7 = 2,3, --- , 1, such that the general solution is 


l 
di = U; a >> uizc;- 
j=2 


The u; are determined by initial conditions for the system (2) with 7 = 2, and 
the u;; by sets of initial conditions for the corresponding complementary system. 
Clearly, if these initial conditions are all chosen independent of E, then since 
the coefficients of the differential equations are all independent of EF, the solu- 
tions u; and u;; enjoy the same property. On the other hand, the c; are in- 
dependent of 62. Hence 


l 
(25) @i(E, 2) = Ui(O2) + Do wis(O2)e(E), i= 2,3,---, 1. 
j=2 
The dependence of the ¢’s, u’s and c’s on the parameters 6; , 63, --- , 6 has 
not been indicated, since these remain fixed throughout the present calculations. 
Let 9) be the 1 — 1 dimensional domain of the values of c;(Z) for Ee W. , 
and C: (c2,¢3, °°: ,¢1) be a point in 9, and denote by S(C) the “surface” 


cE) = c,. Denote the surface S(®, 0) defined in (17) by S(#, 6), and the 
domain D(®) of @ by D(@). Then since | u;;| # 0, therefore for every @& ¢ I, 
every S(C) with C ¢ Y is identical with some S(®, 62) with ® « D(@:), and vice 
versa. From this we conclude for later reference: (A) the functions c,(Z) 
are constant on every S(®, 62); (B) if 6, 6 are any two values in J, then for 
every @ = ©” ¢€ D(62) there exists a ®’ ¢ D(6:) such that S(®’, 62) is identical 
with S(®”, 6), and vice versa. 
Now let us integrate with respect to @: the equation 


l 
0 log P(E | 02) /08. = d2 = Us( 2) + >> Ur;(00)e;(E). 
j=2 


: l 
log p(E | 62) = v(62) + 2) vj(@)e(E) + f(E), 


where »(@:), v;(02), f(F), and all new undefined symbols in the sequel have 
obvious meanings. We get 


l 
(26) p(E | 62) = 0(6s)f(E) exp bP vf@e(B) |. 
j=2 


Next we differentiate the equations (25) with respect to x, , and write the 
result in matrix form, 


(0p;/Oxx) = (u;;)(0c ;/dx;), 1,j = 2,3,---,l;k =1,2,---,U—1. 
Taking determinants, we have 
(27) J ( E, 62) = J 1(@2) Jo(E). 












288 HENRY SCHEFFE 


Finally, we shall need to know the nature of the dependence of ¢; on 6 and E: 
From (3), 





l 
061/062 = Ao(O2) + Bir(O2)¢1 + ie Byoi( 02) ox « 
Substituting from (25), we get 
I 
061/002 = Bi (O2)¢1 + A(O2) + ae B;(@2)c;(E), 


and integrating, 


l 
62 , (t\e( RB 
di(E, 2) = B(62) | ~~ . ~. nines “| 


ee BO 
where 
62 
(28) B(6.) = exp il Byoy(n) in| ° 
Thus 
l 
(29) di(E, 62) = A(@.) + p B,(62)e(E) + B(62)9(E). 


In equations (22) we now use the definitions (20), (18) for the integrands and 
then substitute (26), (27), (29). As a result we obtain the equality of 


EG + 2 B;(62)¢;(EZ) + B(e.\E) | 0(02)f(£) 
‘exp bP vddedB) n 


a ca D5 pe i ry ae I ce EI ca dx; 
= | J1(62)J2(E) | I “ 


and a times the “surface” integral of the same integrand over S(#, 62). Putting 
first s = 0 and then s = 1, and employing the previous conclusion (A), we find 
that the equations (22) are equivalent to 


Foes (oIE)/| FCB) |) TH ae, 
oe j~ 


_— [ (g(EQ(E)/|J(E)|} [L dz;, 8 =0,1. 
S(®,69) j=l 
Again using the expression (29) for ¢; , and noting from (28) that B(@) > 0, 
we may write the inequality (6) in the form 
(31) g(E) < n(@, 6) and g(E) > ko(, 62), 


where 


(32) ki(®, 02) = | ie, 61, 8) —A(@) — a Byo)edE) | / Ble) 











ind 


dx; 


ing 
ind 


ON THEORY OF TESTING COMPOSITE HYPOTHESES 289 


It follows from our hypothesis that for 6. = 6; (the 6. coordinate of 6’) and any 
& ¢ D(62), functions x,;(@, 62) exist such that for the part wS(®, 6:) of S(®, 62), 
defined by (31), equations (30) are satisfied. The region wo(9’) is “swept out” 
by woS(®, 62) as ® ranges over D(@:). Now let 0” be any other 6 ¢ L, call its 
6. coordinate 6 , let 6” be any @ e D(6:), and consider the possibility of finding 
x(®”, 62) such that on the part w)S(®”, 62) of S(®’’, 62), defined by (31), equa- 
tions (30) are satisfied. From the conclusion (B), S(®”, 6:) is identical with 
S(#’, 6) for a suitably chosen ®’ e D(6). Hence if we take «:(®”, 6:) = 
x(®’, 6), then woS(”’, 6:) becomes identical with woS(’, 6) where equations 
(30) are already satisfied. Letting ” range over D(6:), every woS(®”, 6) thus 
determined becomes identical with some woS(#’, 65), and vice versa, by (B). 
Thus the region w(9’’) “swept out” is identical with w(@’). This process 
defines x;(@, 62) for all 6 ¢€J and ® € D(#), and hence determines k,(®, 6; , #) 
from (32). We now have functions k;(@, 6), 8), ki < ke, satisfying (5), and 
corresponding regions wo(@) independent of 0, for all @ «LZ. To conclude the 
proof, we use 1°(c) to reach any point © in w from 0’ by a path consisting of a 
finite number of segments like L on which only one of the nuisance parameters 
varies. The definitions of k;(@, 6} , 8) are continued along this path as above 
and the region wo(@) is seen to be independent of 9 for all © in w. 


The following theorem may be regarded as a generalization of one by Neyman 
[6, p. 33] giving sufficient conditions that a type A region be also of type A; : 

THEOREM 2. Suppose the assumption 1°(b) holds for all © €Q. Denote 
o(E, 6:, 8) by ¢; and let R(#) be the domain of values of $1, $2, °°* , 1 for 
Ee W, and Oew. Then a sufficient condition that a region wo of type B, found 
by application of theorem 1, be also of type B, is that for all 6 €Q and all E « W. 


(33) p(E | A, #) me P(E | 61, P)g(or , de, oe" ,¢1 3 0; A, 8), 


where g(y., Yo, °°: 5413 %3 01, 8) is a function such that d°g/dy; > 0 for all 
Yi, Y2,°°:, yr im RB) and GEQ — w. 

For the wo satisfying the, sufficient conditions of theorem 1, the conditions (a), 
(b’), (c) of definition 2 are satisfied, and it remains only to verify the condition 
(d’). The regions w; admitted for comparison in (d), as well as wo, must 
satisfy the equations (22) since these are equivalent to the conditions (a), (c). 
We recall that © = (6), #) in equations (22) and rewrite them in a notation 
better adapted to our present considerations: 


ia) f [oil tpE | 0, #)/| 1B, 62, 0) |} TI ax; 
woS(%,89,9) j=l 


“ a. » lei tr@ | ot, o)/ ve, 6, 8) Thar, 8 = 0,1 


0 0 0 0 0 
where ® = (¢2, $3, °++ ,¢1) «D(A, #). 
To express the condition (d) in a convenient way, we now “shred” the regions 
wy , wi of (d) for every 6, by means of the same ‘“‘surfaces”’ we have been using 






















290 





HENRY SCHEFFE 


for 6, = 6, : For any w in W,, 0 €2, and ® « D(6! , 3) we define a “surface” 
integral 





1(@°, w |, 0) = [ 


wS ( $,60,9) 


(p(E | 1, 3)/| J(E, 61, 3) | } I] da;. 
- 


Then 





P(w| 61, 3) =|. Ba 1(®°, w| 01, 8) dbz do3 --- dd, 


and a sufficient condition for (d) is 
(35) 1(@’, Wo | A; ; d) = 1(@’, Wi | A; ; #) 
for all 6 €Q and all & « D(6 , 8). 
Again applying the lemma of Neyman and Pearson to the integrands of the 
‘surface’ integrals in (34) and (35), we find that a sufficient condition that our 


region wo be of type B; is that there exist functions b,(®°, 6{ , 6, #), i = 1, 2, 
such that 


‘ 


P(E | 6,8) > p(E | 6, 8)[bo + bidi(E, 61, 9)] 
if and only if HE ewo. Employing (33), we may replace this inequality by 
(36) gio, ©; 6 5 1, 9) > bo + Didi . 


Define bo , b; from 
g(ki, Bs 656,89) = bo + Dik: , i= 1,2, 


0 0 oe . e 
where k; = ki(®, 6, 8). Since ki < ke, these equations have unique solu- 
. r 0 . 0 : 4 . 
tions by , bi. Now hold ®, 6, , & all fixed (6, # 6) and consider the graphs of 
the members of (36) as functions of ¢;. From our definition of bo , bi, these 
graphs intersect at kh; , k2. But by hypothesis, the graph of the left member is 
. 0 . : e 
everywhere concave up, and hence for ki < ¢@ < ke, it lies below the linear 
. . . ( 0 : : rr 
graph of the right member, and for ¢; < k; and ¢; > ky, it lies above. That 
is (36) is true if and only if Hew. 


5. Appendix on the moment problem. Easily applied criteria [8] are avail- 
able for the moment problem of assumption 5’(a). The moment problem 5°(b) 
is much more difficult, however, because the function to be determined by its 
moments is not of constant sign. Below we offer a proof that the solutions of 
both problems 5’(a) and 5°(b) are unique in the important case where p(E | @) 
is a multivariate normal p.d.f. and ¢:, ¢2,---,¢, are polynomials in 2, 
do, °**,2, of degree < 2 and not necessarily homogeneous. Since @ is held 
fixed, we will not indicate dependence on 9, nor will the dependence of various 
functions on s be indicated, since s = 0 or else 1 throughout. 

Let w; , we be any two regions, a; = P(w;) # 0, for which the moments of 
Q.(@ | wi) and Q,(® | we) are the same. To prove the equality (almost every- 





ON THEORY OF TESTING COMPOSITE HYPOTHESES 291 


where) of these two functions it suffices to prove that their Fourier transforms 


are identical [7, theorem 61]. Suppressing the customary multiple of ~/27, the 
Fourier transform of Q,(@ | w,) is 


+eo +0 
wt) = fo fe *Q@| wy) doe +++ dr, 


where t is the vector (2, #3, --- ,t) and t-@ = topo + --+- + t,. From (4) 
we get 


+00 +00 
wt) = ff & oiler, & | wy) ddr dee «+> de 


SS 
&( 


(e*'® gt | w;) 
1 


oy ws 


e* oi(E)p(E) dW. 


A device of Cramér and Wold [8] for reducing the dimensionality of the 
problem now suggests itself. Let z be a scalar variable and consider y;(z | t) = 
W ,(zt) for fixed t as a function of z. Obviously if for every fixed t, yi(z|t) = 
yo(z|t), then W,(t) = W(t), and we are through. We propose to prove the 
former equality by showing first that y; is an analytic function of z for all real z 
and secondly that the coefficients of the power series for y and y2 in powers of z 
are equal. Holding t fixed now, § = t-® is a polynomial of degree < 2, and 


- 1 izé ,8 
(37) yz |t) = =f ec 6» dW. 
y OF 


By our assumption of normality, 


= Cexp| — Dur ey }- Yn = Ue — Uy 


xk, y=1 


where the matrix (a,,) is positive definite. To prove the analyticity of y; 
for any real z = 2, let z = 2% + ¢, and restrict ¢ to real values. Substitute 
in (37) 


= > i ae + — a 


q=0 


(sé), 


where | fn(¢é) | . Then 
m—1 qa ws 
Wile + ¢|t) =p [« 208 £1 gt dW + Rim(20, $), 
q=0 J: Qj %w; 
where 


«ae 


= = 
m! a; 


| | ce f,.(cE)E” o1 p dW, 
















292 HENRY SCHEFFE 








and all integrands are absolutely integrable over W. Let o be the sphere of 
unit radius with center at (ui, we, °°, Hn) in W and write 


Rim = eT f. 7 [| 


Call the two terms of the right member R;» and R¥ m , 
R jm = | + Na ° 
sl oe isis . 
| Rim | & - i |g o;| pdWw. 
M! a; Sw 50 
Let M = max |é|, MW; = max | ¢; |, for Leo. Then 


(eg ees [ paw < M,| M¢|"/mta;. 


m! a; 
Hence R;, — 0 for all real ¢ as m > &. 
fa R’., | “ od / | em 8 | dw ; 
—s la on™ me 
n . 4 
Let r = (o i) , and M,., M3; be the sums of the absolute values of the coeffi- 
x=] 


cients of the polynomials ¢; , £—, respectively, when expanded in powers of y, . 
Then for Ee W — o, |¢i| S Mor’, |&| < Mor’, p S C exp (—Xd7”), where 
\ > 0 is the smallest characteristic root of (a,,). Hence 







ae CM 


| Rim iso™ Ms§ : 


2m+2 _—Ar2 , 
—— Ff ote aw. 
Mm. A; Wo 





Integrating over spherical shells concentric with ¢, dW = Myr" dr, and 


IIA 


1 Ri,| <= CMaMs| Mae" | * amintt Wet g CMa Ma| Mag |" tr 
7m = . 
1 0 


m! a; m! a; 


If we evaluate the last integral in terms of a Gamma function and employ 
Stirling’s formula we easily find that for M;|¢| < A, Rin + 0. The con- 
vergence of R jm to zero for real ¢, | ¢ | < A/Ms3, is sufficient to insure the analy- 
ticity of y;. 

Now let 2 = 0. Then the coefficient of ¢* in the power series for y; is 


94a 


A [ (ta¢2 + +++ + tii) dip dW, 


! 
q:. Qj w j 


a linear combination (the same for 7 = 1, 2) of the g-th order moments of 
Q.(@ | w,), and hence corresponding coefficients for y, and ye are equal. 





ON THEORY OF TESTING COMPOSITE HYPOTHESES 


REFERENCES 


{1] J. Neyman, “‘Sur la vérification des hypothéses statistiques composées,’’ Bull. Soc. 
Math. France, Vol. 63 (1935), pp. 246-266. 
. NeEyMAN, ‘‘Outline of a theory of statistical estimation based on the classical theory 
of probability,’’ Phil. Trans. Roy. Soc. London, A, Vol. 236 (1937), pp. 333-380. 
J. NeyMAn, ‘‘On a statistical problem arising in routine analyses and in sampling in- 
spections of mass production,’’ Annals of Math. Stat., Vol. 12 (1941), pp. 46-76. 
. NEYMAN and E. §. Pearson, ‘‘On the problem of the most efficient tests of statistical 
hypotheses,’ Phil. Trans. Roy. Soc. London, A, Vol. 231 (1933), pp. 289-337. 
. NeyMaAN and E. 8. Pearson, ‘Contributions to the theory of testing statistical 
hypotheses: Part I,’’ Stat. Res. Mem., Vol. 1 (1936), pp. 1-37. 
[6] J. NeymMan and E. S. Pearson, ‘‘Contributions to the theory of testing statistical 
hypotheses: Part II,’’ Stat. Res. Mem., Vol. 2 (1938), pp. 25-36. 
[7] S. Bocuner, Vorlesungen tiber Fouriersche Integrale, Leipzig, 1932. 
[8] H. Cramér and H. Wo.p, ‘‘Some theorems on distribution functions,” Jour. London 
Math. Soc., Vol. 11 (1936), pp. 290-294. 













ON THE PROBLEM OF MULTIPLE MATCHING 


By I. L. Batrrin 
Drew University 


1. Introduction. The problem of determining the distribution of the number 
of “hits” or “matchings” under random matching of two decks of cards has 
received attention from a number of authors within the last few years. In 1934 
Chapman [2] considered pairings between two series of ¢ elements each, and 
later [3] generalized the problem to series of u and t(< wu) elements respectively. 
In the same paper he also considered the distribution of the mean number of 
correct matchings resulting from n independent trials, and gave a method, and 
tables, for determining the significance of any obtained mean. In 1937 Bartlett 
[1] considered matchings of two decks of cards, using a number of interesting 
moment generating functions. In 1937 Huntington [12, 13] gave tables of 
probabilities for matchings between decks with the compositions (5°), (4°), and 
(3°), where (s‘) denotes a deck consisting of s of each of ¢ kinds of cards. More 
generally (sis. --- s,) denotes s; cards of the first kind, s2 of the second, ete. 
Sterne [16] has given the first four moments of the frequency distribution for 
the (5°) case and has fitted a Pearson Type I distribution function to the distri- 
bution. Sterne obtained his results by considering the probabilities ina 5 X 5 
contingency table. He also considered the 4 XK 4 and 3 X 3 cases. In 1938 
Greville [7] gave a table of the exact probabilities for matchings between two 
decks of compositions (5°). Greenwood [4] obtained the variance of the distri- 
bution of hits for matchings between two decks having the respective composi- 
tions (s‘) and (s:s2 --+ s;) with s; + s: + --: + s, = st = n, and where it is 
not necessary that all the s’s should be different from zero. Earlier Wilks [19] 
had considered the same problem for ¢ = 5 and n = 25. 

In a very interesting paper Olds [15] in 1938 used permanents to express a 
moment generating function suitable for the problem in question. He obtained 
factorial moments and the first four ordinary moments about the mean, first 
for two decks with composition (4°), and then for two decks of composition (s‘). 
In 1938 Stevens [17] considered a contingency table in connection with match- 
ings between two sets of n objects each, and gave the means, variances, and 
covariances of the single cell entries and various sub-totals of the cell entries. 
Stevens [18] also gave a treatment of the problem of matchings between two 
decks which was based on elementary considerations. In 1940 Greenwood [6] 
gave the first four moments of the distribution of hits between two decks of any 
composition whatever, generalizing the problem which had been treated earlier 
by Olds [15]. Finally in 1941, Greville [8] gave the exact distribution of hits 
for matchings between two decks of arbitrary composition. He also considered 
the problem from the standpoint of a contingency table, as had been done 
-arlier by Stevens. 

294 


ON THE PROBLEM OF MULTIPLE MATCHING 295 


In 1939 Kullback [14] considered matchings between two sequences obtained 
by drawing at random a single element in turn from each of n urns U; containing 
elements of r types £; in the respective proportions p;;. He showed that if 
the process of drawing were indefinitely repeated the distribution of hits would 
be that of a Poisson series. 

The work which has been done thus far applies to the problem of matching 
two decks of cards. In the present paper a method is developed for obtaining 
the moments of the distribution of hits for matchings between three or more 
decks of cards of arbitrary composition. 


2. Matchings between two Decks of cards. In the present paper it will be 
convenient to take as the point of departure the method used by Wilks [19] 
in his treatment of the problem of hits occurring under random matching of two 
decks of 25 cards each, namely a target deck with composition (5°) and a match- 
ing deck with composition (s;),7 = 1, 2,3, --- , 5, = s; = 25. He showed that 


1 
(1) @= psy + a2 + +++ + 45)°(ai + are’ + ay + +> + 2%)° 


Si 


oss (a +at::: + 25e°)° 


where, 


25] ~—-25! 
8; 81! Sol +++ gs!’ 


is a suitable generating function for obtaining the moments of the distribution. 
In fact, if we define an operator K,,s,...., aS 


(2) Koys5--.854 = Coefficient of axj'x3?--- 45° in u, 


where wu = u(ai, 22, °**, 25), and if h denotes the number of hits, then for 
r = 1, 2,--- ,5, 


(3) P(h = r) = coefficient of ec” in Koejsy-..05 


And it is readily seen that 


— a’ ¢ 

(4) E(W) = Kay.n pel 

Wilks’ ¢ function involves a particular order for the target deck. If we are to 
generalize and obtain moments for matchings between more than two decks, 
it is obvious that we must devise a procedure which will, in the case of two 
decks, be perfectly svmmetrical and not require that one deck be given a pre- 
ferred status. In the case of two decks this is readily accomplished by the use 
of Kronecker deltas, and in the case of three or more decks by the use of obvious 
generalizations of these deltas. 








296 I. L. BATTIN 


For two decks of 25 cards each with compositions (5°) we need only let 
ideas 5 25 
(5) ¢ = (a ye”)? = ( Do 2 Yi cuit) 
i,j7=1 
where 6;; = 1; 6;; = 0,7 ¥ j. 
Then, if 


noo no5°: 


(6) Kn, ,ny9-+-nygeno1n29---no, U = Coefficient of rp ''x2!? --- xg yr? ys 2? -- + ys 
where wu = u(%1, %2,°°+, 25, Yr, Yo, °**, ys), it visiiine follows that 


K o"¢ 
(7) E(h’) = —_ 55555-55555 36” _ 


K 55555-55555 9 | |e 
















in”, 


More generally, for two decks of n cards each, the cards being of k types, and 


the decks having compositions (m1, M2, +++ , Mix), (Ma, Nez, *** , Mex) respec- 
tively, we let 
k n 
(8) @ =u" = (x;y; ec)” =(> Xi ne) : 
i,j=1 


The factors of @ are in one-to-one correspondence with the n events of dealing 
a card from each of the two decks. The values which can be assumed by the 
subscripts 7 and 7 are in one-to-one correspondence with the k types of cards. 
The symbol z; corresponds to the first deck, y; to the second, the subscripts 7 
and j corresponding to the different types of cards in each deck. The expansion 
of ¢@ consists of all products which can be formed by choosing one and only one 
pair rays from each factor of ¢ as a factor of the product. In forming any term 
of ¢, choosing ray. from any factor of @ corresponds to dealing a card of type @ 
from both decks, and introduces e’ into the coefficient of the term. Choosing 
Lays from any factor corresponds to dealing a card of type a from the first 
deck, 8 from the second. If a ¥ 8, then, since 6;; = 0,7 ¥ j, e’ is not introduced 
into the coefficient. Therefore in the coefficient of any term of ¢, e° will be 
raised to a power, say s, which is equal to the number of factors of @ from which 
pairs XaYa have been chosen. 
The total number of ways in which the term 
















mii ni2 .* noi noo Nok 







so Yr Yo Ue 
can arise is equal to the number-of ways in which two decks of types (1;), (n2;) 
respectively can be dealt, (where (m1;) = (mum +++ Mx) and similarly for (nj). 


But this is given by 


k 
Piticstes---teastiestian- ~<a Wal = Diiishiass+“tgiisaiiaia shia 2 oi ‘Sah 


i=l i=1 


k n k n 
x) Kong ingy-+-nox (= u) 
L i=l 













ON THE PROBLEM OF MULTIPLE MATCHING 297 


The coefficient of e in Ky, ,n,9--.nypengynoa-+-no, 18 the total number of ways in 
which the term af!'ag!? --- ag *yf*y2?? --- yz?* can be formed subject to the 
restriction that pairs x;y; with 7 = j are chosen from s of the factors of ¢. But 
this is precisely the number of ways in which the two decks can be dealt so that 
there will be s hits. Hence if, as above, h is the number of hits, the probability 
that h = s, assuming all permutations in each deck to be equally likely, is 
given by 


: 36 ; 
(10) a coefficient of e* in Konyimia---migemainan sae 


Kony inye-+-mizenaingg: nek } \6—0 


Since this is true for all values of s it follows that 


" ao 
; Pinieteeiion aie, 3g? A 
(11) Eth”) a = | 


Barn yngg--ngqemgains---tgs g | 6=0 





Since 


k k i 
t 5; 58 6] n—1 
nu™ | & juan | = nf rage | 
i 6=0 6=0 


Em] Eo) 


n 


we have at once 


k k n—1 
Eth) = r Ria ssesiancnievonile Ui 


Nii 


N2j; 


k n—1 
"Keng ingg---ngs—1(mes—D ng ita -+- nae i Yi 


7=1 


cna ccc OE teclegerieceies 
i | | n | “=I nu! - +> mu-al(m: — 1)! nus! --- ny! 


Nis No; 
(n — 1)! 
Noi! +++ Neis!(Mx — 1)! mana! ++ Nox! 








k 
a 
= a 
It is an equally straightforward matter to show that 

P 1/22 NiNyx , m\n — 1) no;( Nox —_ 1) Naz M1; Na; No; 
13) E(h®) = ee. 9 2i( Ne Nai Maj Nei Na} 
| x - n(n — 1) bl iz; n(n — 1) 
and that 


2.2 (2) (2) 
2 Nii Noi yi Nei Ni Nei N1i N1j Nai Nj 
(14) i= F | ee +" -|+ > 
tx) 


F n n2 ne ‘n(n — 1) 








298 I. L. BATTIN 





k 
Evidently any of the nm; and m2; may be zero, provided only that - ui = 
i=l 


_ no; = n. The case of two decks with unequal numbers of cards m and n, 


(m <n), is readily handled by substituting for the smaller deck one obtained 
by adding n-m “blank” cards—that is, cards of any type not already appearing 
in either deck, as indicated by Greville [8], who however obtained his results 
by considering a preferred order for one of the decks. 

EXAMPLE 1. In the case of the decks treated by Wilks [19], n = 25, k = 5 
Nii = No; = 5. Hence from (12) 


5 ( 
E(h) = me = 6, 








and from (14) 






+ Seo yy 5:5-5-5 
25-24 2 (25)°24 


EXAMPLE 2. Suppose we have two decks as shown by the scheme 


Type of card Total of all types 
1 2 3 + 5 





No. in deck A F 7 | 8 0 7 0 | | 20 
No. in deck B 0 3 4 6 2 15 





or 














Here deck B has five fewer cards than deck A. Hence we must presume that 
there are six types of cards in all, and that the decks have the respective distribu- 
tions (578000) and (034625). We then have at once 










Nii Nai 


mites. 0 +3-7 + 48404040] 


n 


| 
M- 


E(h) = 


i=1 


2.65 


6 










( 2 2 (2) 6 
he ) Nii Noi Ni N2i Ny Nei \ Nari Nai N1j Noj 
a = +. 2 + 

i=1 n 


n* n® | ijm1 W(n — 1) 
ivi 


rw 













] 
20-19 


ll 
bo 
o> 
or 

| 

= 
“J 
-+- 
nan 
Ss 
+ 


(3-2-7-6 + 4-3-8-7} 


+ (3-7-4-8 + 4-8-3-7} 











ON THE PROBLEM OF MULTIPLE MATCHING 299 


3. Matchings between three decks. Let the three decks be of types 


q 
(MuNi2 +++ Nig), (NeiMe +++ Neg), (NgiN32 +++ N3g) respectively, with y R Ni 
t=1 


q n 
6; 548 +6; 5639465461346 5,0 
i | 7. LiY;j 2k e i jk9123 45; 591246 (451345 jx " = u”, 


i,7,k=1 
where 
(16) jc = 1, 5:3: = O i, j, & not all equal, 


and the other deltas are the usual Kronecker symbols. 

Each factor of ¢ corresponds to one deal from each of the three decks. The 
symbols x, y, and z correspond respectively to cards in the first, second, and 
third decks. The subscripts 7, j, k, = 1, 2, --- ,q correspond to the types of 
cards—there being qg distinct types. 

Choosing 2aYaz« from a factor of @ corresponds to a deal in which a card of 
type a is dealt from all three decks, and introduces e”?***2*8*%8 into the coef- 
ficient of the corresponding term in the expansion of ¢. Similarly, choosing 
LaYa%3, B # a, corresponds to a hit between the first and second decks, and 
introduces e!? into the coefficient. Similarly choosing xayg2a introduces e*'; 
tsYaza introduces e?*. Choosing taysz,, a * B ¥ y ¥ a corresponds to a deal 
with no hits, and introduces no powers of e into the coefficient, since all the 6’s 
are zero. 

Let Kn,;-np;-n3, be defined by 


(17) Knj,-ngj-nggt = coefficient of ap" --- ry yr™* +--+ yg**zi >>> 2o** in wu. 


Then the coefficient of ¢'!?*"!?* in Ky, ;-nz;-nax | 6;2=6:3=623—6 18 the number of ways 
in which the cards can be dealt so as to yield precisely hi23 triples, or hits between 
all three decks. Similarly the coefficient of ¢'!?*!? 
is the number of ways in which the cards can be dealt so as to yield precisely hie 
hits between the first and second decks, with corresponding results for the first 
and third (hj3) and second and third (ho3) decks. 

By the same reasoning as before then, we have 


in Ka, ;-ng5-nea? | 0; 9=0; 3=693=0 


‘ ao 

a } rn 80% eo 

(18) E(hj3) = —_-—————— 
Ka, ;-ngj-nan% |@’s=0 

Od 


Kony j-ngj 

1i*Mej°N3k r 

eS 0612 ¢'s=0 
Ka, ;-ngj-nan% \6’s=0 


with similar results for hjz and ho». And it is a straightforward matter to 
show that 


? 


(19) E(hi2) = 


, 











I. L. BATTIN 


E (hips) ao 





3 


lite) = nO (TE) + nin — 1) (TT) 


a=l n® 


- Mai Na 
rin (TB!) 
1,j7=1,(147) a=1l n? 


(21) 


(22) E(hy) = - 


Ni Noj N3k 


S 
~ 


| 
iM: IMs iMs 


S 
I 
= 


ae 










(23) (hy) = 


Nk Noj N3k 


_ 


Ny Neoj 3; 


(24) Eos) = 5 


i 

ih? I 1 (2) 

Ethie) = ae Zz Niu Nai Ns + —— ——— ni aly N3k 
n® “Sk n(n — 1)?L‘%s 


(2) (2) 
(25) + os Nii ns? Nsk Agr + Zz N1i N11 Noi Na N3x 
t, kr k, iAl 


+ YE nny nes ne nsx na 
iAl, kr 

with corresponding results for other moments. It is understood each summation 

index takes values from 1 to q. 

As before, if the decks do not all have the same total number of cards it is 
merely necessary to introduce one or more sets of “blank” cards. Thus we 
would replace decks with the compositions (57800), (03462), (00335) by hypo- 
thetical decks (5780000), (0346250), (0033509) and proceed as before. 

EXAMPLE 3. For three decks of 25 cards, consisting of five of each of five 
kinds we have n = 25, nai = 5, a = 1, 2,3,7 = 1,2,---,5. Hence 












5 3 
tenes it . 
i=1 a=1 20 
2 ./5\% 5’ 
E(hi) = 25 2 (3) + 25-24 Sis aA) 4 25-24 = ( a) 


14] 

47 

= ] — 

48 
47 
48° 








2 — 
Thies = 


1 < 53 
Elhe) = (5% 25 


= 5 








ON THE PROBLEM OF MULTIPLE MATCHING 


1 = 3 ] | e343 - 442 
= ” 5 — * 4 5°4 
(25)? 2X + (25)?(24)? ZX dials ro 


+ Do 84+ 


i,l,k=1 i,l,k, 
i i 


5 
k 


295; 
2 1 
Cie 4. 


. . . . 2 2 
with similar results for E(hi3), E(hes), oh,; , aNd o},, - 


4. Generalization to any number of decks. If the moments of the distribu- 
tion of hits—doubles, triples, quadruples, ...—in matching any number of 
decks is desired, these can be obtained by using an obvious generalization of 
(15). Thus for four decks we would define 6;::; = 1, 6:j4. = 0, 2, 7, k, l not all 
equal, and use 


q n 
(26) o= | > i ET tale ee 
iYj ek 


i,j,k, l=1 


However, it is evident that as the number of decks is increased the summations 
involved and the manipulation of the (generalized) K operators rapidly become 
complicated. 


5. Application of our moment-generating technique to two-way contingency 
tables. The moment-generating technique which we have discussed has wider 
applications than merely to matching problems. As an example of considerable 
interest we shall consider the contingency problem. Consider the array 


= 1, 2, seer 
(27) hs am 2. 2, “8 


=>in. = dn. =n 
a B 


and also the function 


(28) ¢ = I] (a3 e'a8)a- _ I] (= z5et*) ; 


If 7 and j are particular values of a and 8 respectively, then to the 7-th row 
of the array corresponds the product (a3e*'*)"**, consisting of n;. identical 
factors xge"’*, one such factor corresponding to each of the n;. elements in the 
row. To the j-th column of the array corresponds the x; which appears in each 
of the factors of @. To the ¢j-th cell of the array corresponds e*'! which appears 
only in the factors (age***)"** , and in each of these only as the coefficient of 2; . 








302 I. L. BATTIN 


The expansion of ¢ consists of all products which can be formed by taking as 
factors one and only one element x ;e°** (not summed) from each factor of ¢. 
But taking 2,e’'' from one of the factors (xse"**)""* of @ corresponds exactly to 
putting an element in the 7j-th cell of a lattice such as (27). Hence every term 
in the expansion of @ corresponds to a particular distribution in such a lattice. 
Moreover, all terms of ¢ correspond to distributions in which the row totals 


7 > 6; Fs 1 
are N.., for we must take n;. elements from the product (zge**)"** . Further, 
those terms in which the xg appear in the particular product aj‘'xg** +--+ ay 
correspond to distributions in which the column totals are 1.1, N.2, -+-*, Ms, 


‘ r ‘.. ‘ ‘ 
since choosing n.; elements x,e*’ corresponds to putting n.; elements in the 
j-th column and some row of the lattice. 

Expanding ¢ we obtain 


" > Nap9a 
= §e +| II |e: | "ate ar see ght 


a=1 Laps 
where the summation is over all partitions (NaiMa2 +++ Nas) of the ne. such that 
(Nigtog +--+ Ng) is also a partition of n.3. It is clear that since every set of 


values of the nas subject to the partition restrictions >> mas = Ne. , 2. Nas = Neg 
B a 
corresponds to a particular distribution of n elements in the lattice (27), every 


: 
particular product [] al corresponds to such a distribution, and represents 
a=1 a3 
the number of ways in which it can arise. Further, the total coefficient dis- 
Na. 
Nas 
distributions with row totals n,. and column totals n.3 can arise. Setting all the 
6.23 = 0 we readily find 


: 
played (29), namely >> [J | | represents the total number of ways in which 
a=1 


2 I] i = Ka.sn.g---0.@ 6.320 = Riise, Aa + Xe + ca + a.)” 


a=1 Las 
_|n 
N.3 | - 


Hence the probability of any particular distribution || nas || with fixed row totals 
N. and fixed column totals n.3 is 


(31) P( 7 Lm 


Nap ||'| Ne.» 2.8) = 
n 
| 


Moments of the n;;. Consider now the result of differentiating @ with respect 
to a particular 0.3, say 6;;. We obtain 


0 Na. = "as6ap n neo n 
(32) Sa... 4), ns I ; |e ry aget att + 


06;; a Nas 


(30) 








ON THE PROBLEM OF MULTIPLE MATCHING 303 


where >>, denotes summation over indices such that > nas = Ne, 
a 


: Naj + Ni = nj (B Xj). Now ni; < min (nj. , n.;), but also n;; can never be 
afi 


less than n.; — (n — n;.). For n.; = nij + a Na;. Since the maximum value 
ai 
of Naj < Na. , the maximum value of Zz Naj S 7 Na.. Hence 
ai afi 


Nj = NG — i Nai 2 1.5 — Zz Ne. = 2. — (n — j.). 
; afi 


Therefore 
max (0, n.; —n + ni.) < ni; < min (nj. , n.5). 


Accordingly, combining all the terms of (32) in which n;; has a particular value, 
vy, we have 


0.; + i a a II | rap 


y= max (0,n,.j;—n+nj.) > Nap=N.p a 


(33) 


yw 
nape 
a "ablas Rel Mee 


- E28 2 Xo" eee _~ + 


where >* denotes summation and II* multiplication with n;; = vy. 


si * * | Na. |; : » 7 : ° 
Since )o* [I |" | is precisely the number of distributions || na || for 
D rap=nr-p & ap 


a 


which n;; = y, it follows that 


7 1 r. og | 
(34) En; | Na. 5 nN.) = Kon yng--ms a0, | 
nN.8 0ap=0 


1 a°¢| 
(35) E(n?; | na, Mg) = ry inane 


n 
Ng @ap=0 
1 ” OPtig 
nmin: 86%, 808 | 
tj kl | 
1” | 


Similarly it follows that 


(36) E(n?; ni | na. , Ng) 


10a p=0 
where we may havez = k ori # k, andj = lorg # l. 

By straightforward differentiation and reduction we find that for the array 
(27) with given marginal totals na. , N.s 


(37) E(nij) = ma 
2 n>?) n Ni. N.; 
(38) E(nj;) = — + — 


n?) n 








304 I. L. BATTIN 















(39) ia [n? — n(n. + n.;) + nin |ni.n.; 
n*(n — 1) 

(3) (3) (2), (2) 

(40) E(n3,) i. Ni. n.; 43 N;. tcl re 


ni? n° 


Ni. N.;7 
n 





(4) 


(3) 
Nj. : 


(4) 3) (2) (2) 
nie ns hs, 0. 


(41) E(ni,) = o> 47 + 


n n n@ n 





and if 7 and k, 7 and / are distinct 


(2) . (2) .. (4) (3) 2) 

socaey) is Ni. Me nN, (2) 9)\ N.; Ni. Nike Ne; 

(42) E(nijnj) = — 1 + (ne nee + NG. NE. ; ae 
nh ns) n 2) 













ni n® n& n® — n® nn 
, 2 2% ie e7 el (2) (2) . . 7 FULL 
(43) E(nijnin) = “~—* + (nj na + nn) — as 
n n‘*? n 
al? af? a 9% n? nn? n 
; 2 23 ae i: ke +) el te ke °) of 
E(nj; Ni) = (4) + - (3) 
n n& 
(44) (2) (2) 
4 MiMi Nj Me Ni. Nhe Nj Ney 
n®) n) 7 












Moments of the distribution of Chi Square. For the array (27) 


Na.N.g\” 
ta - —— 
ps : 


a3 Na. N.g 


n 


n 2 Na. N.g 
= 7 Nag — 2nag + — ; 
a,8 | a. 1.3 n 


Hence, using the above results we can, theoretically, find all the moments of the 
exact distribution of x”. It is not difficult to show that 
















(46) E(x°) = (r — 1)(s — 1). 
n—1 





The value of FE[(x*)*] and the variance of x° were found by straightforward 
application of our methods and the results agreed with those given by 
Haldane [10]. 

The writer is indebted to Professor Wilks for helpful criticisms and suggestions. 


REFERENCES 

{1] M. S. Barrett, ‘‘Properties of sufficiency and statistical tests,’’ Proc. Roy. Soc., A, 
Vol. 160 (1937), pp. 268-282. 

{2} Dwicut Cuapman, ‘The statistics of the method of correct matchings,’’ Amer. Jour. 
Psych., Vol. 46 (1934), pp. 287-298. 

[3] Dwicur Cuapman, “The generalized problem of correct matchings,’ Annals of Math. 

Stat., Vol. 6 (1935), pp. 85-95. 











ON THE PROBLEM OF MULTIPLE MATCHING 305 


J. A. GREENWOOD, ‘‘Variance of a general matching problem,’’? Annals of Math. Stat., 
Vol. 9 (1938), pp. 56-59. 

J. A. GREENWOOD, ‘Variance of the ESP call series,’ Jour. of Parapsychology, Vol. 2 
(1938), pp. 60-64. 

J. A. GREENWOOD, ‘“‘The first four moments of a general matching problem,’’ Annals 
of Eug., Vol. 10 (1940), pp. 290-292. 

T. N. E. Grevitue, “Exact probabilities for the matching hypothesis,’’ Jour. of 
Parapsychology, Vol. 2 (1938), pp. 55-59. 

T. N. E. Grevitir, ‘The frequency distribution of a general matching problem,” 
Annals of Math. Stat., Vol. 12 (1941), pp. 350-354. 

J. B. S. Hatpang, ‘““The mean and variance of Chi square, when used as a test of 
homogeneity, when expectations are small,’’ Biometrika, Vol. 31 (1940). 

J. B. S. Hatpane, “The first six moments of Chi-square for an n-fold table with n 
degrees of freedom when some expectations are small,’’ Biometrika, Vol. 29 
(1939), p. 389. 

J. B.S. Haupane, “The exact value of the moments of the distribution of chi-square, 
used as a test of goodness of fit, when the expectations are small,’’ Biometrika, 
Vol. 29 (1939), p. 133. 

k. V. Huntineton, “A rating table for card matching experiments,’ Jour. of Para- 
psychology, Vol. 4 (1937), pp. 292-294. 

kK. V. Huntineton, ‘“‘Exact probabilities in certain card-matching problems,’ Science, 
Vol. 86 (1937), pp. 499-500. 

| Sotomon Ku.tuBack, ‘‘Note on a matching problem,” Annals of Math. Stat., Vol. 10 
(1939), pp. 77-80. 

E. G. Oups, “‘A moment-generating function which is useful in solving certain matching 
problems,’ Bull. Amer. Math. Soc., Vol. 44 (1938), pp. 407-413. 

T. EK. Srerne, ‘““The solution of a problem in probability,’ Science, Vol. 86, (1937) 
pp. 500-501. 


W. L. Stevens, ‘‘Distribution of entries in a contingency table,’ Annals of Eugenics, 
Vol. 8 (1938), pp. 238-244. 

W. L. Stevens, ‘‘Tests of significance for extra sensory perception data,’’ Psycho- 
logical Review, Vol. 46 (1938), pp. 142-150. 

S.S. WiLks, “Statistical aspects of experiments in telepathy,”’ a lecture delivered to 
the Galois Institute of Mathematics, Long Island University, December 4, 1937. 










ON THE CHOICE OF THE NUMBER OF CLASS INTERVALS IN THE 
APPLICATION OF THE CHI SQUARE TEST 


By H. B. Mann ann A. WaAtp! 


Columbia University 







Introduction. ‘To test whether a sample has been drawn from a population 
with a specified probability distribution, the range of the variable is divided 
into a number of class intervals and the statistic, 







i=k » — Np, , 
(1) ee wy, 
i=l IND; 


computed. In (1) k is the number of class intervals, a; the number of observa- 
tions in the 7th class, p; the probability that an observation falls into the 7th 
class (calculated under the hypothesis to be tested). It is known that under 
the null hypothesis (hypothesis to be tested) the statistic (1) has asymptotically 
the chi-square distribution with k — 1 degrees of freedom, when each N7p; is 
large. To test the null hypothesis the upper tail of the chi-square distribution 
is used as a critical region. 

In the literature only rules of thumb are found as to the choice of the number 
and lengths of the class intervals. It is the purpose of this paper to formulate 
principles for this choice and to determine the number and lengths of the class 
intervals according to these principles. 



















If a choice is made as to the number of class intervals it is always possible to 
find alternative hypotheses with class probabilities equal to the class probabilities 
under the null hypothesis. The least upper bound of the ‘distances’ of such 
alternative distributions from the null hypothesis distribution can evidently be 
minimized by making the class probabilities under the null hypothesis equal to 
each other. By the distance of two distribution functions we mean the least 
upper bound of the absolute value of the difference of the two cumulative 
distribution functions. We have therefore based this paper on a procedure by 
which the lengths of the class intervals are determined so that the probability 
of each class under the null hypothesis is equal to 1/k where k is the number of 
class intervals.” 

Let C(A) be the class of alternative distributions with a distance > A from the 
null hypothesis. Let f(N, k, A) be the greatest lower bound of the power of the 
chi-square test with sample size N and number of class intervals k with respect 
to alternatives in C(A). The maximum of f(N, k, A) with respect to k is a 
function @(N, A) of N and A. It is most desirable to maximize f(N, k, A) for 


1 Research under a grant in aid from the Carnegie Corporation of New York. 

2?This procedure was first used by H. Hotelling. “The consistency and ultimate dis- 
tribution of optimum statisties,’’ Trans. Am. Math. Soc., Vol. 32, pp. 851.) It has been 
advocated by E. J. Gumbel in a paper which will appear shortly. 


306 


CLASS INTERVALS IN CHI SQUARE TEST 307 


such values of A for which ®(N, A) is neither too large nor too small and in this 
paper we propose to determine A so that (N, A) is equal to 3. 

Hence we introduce the following definitions: 

DEFINITION 1. A positive integer k is called best with respect to the number of 
observations N if there exists a A such that f(N, k, A) = 3 and f(N, k’, A) < 3 
for any positive integer k’. 

DEFINITION 2. A positive integer k is called ebest (0 < € < 1) with respect to 
the number of observations N if ¢ is the smallest number in the interval [0, 1] for 
which the following condition is fulfilled: There exists a A such that f(N, k, A) 2 
+ — eand f(N,k', A) < 3 + efor any positive integer k’. 

It is obvious that an e-best k is a best k if e = 0. If € is very small an e-best 
i: is for all practical purposes equivalent to a best k. 

Since f(N, k, A) is a continuous function of A it is easy to see that for any 
pair of positive integers k and N there exists exactly one value ¢ such that k is 
e-best with respect to the number of observations N. Since the value of this e 
is a function of k and N we will denote it by e(k, N). 

DEFINITION 3. A sequence {ky} of positive integers is called best in the limit if 
lim e(ky , N) = 0. 


N=x 


In this paper the following theorem is proved: 


5/9(N — 1)2 
THEOREM 1. Let ky = 4 y/ an — where c is determined so that 
eS 


. fn ”- sin - 
val e” ' dx ts equal to the size of the critical region (probability of the critical 
Tie 


region under the null hypothesis) then the sequence {ky} is best in the limit. 
Furthermore lim f(N, ky , Ay) = 4 for Ay = a a 
N=0 ky ky 
It is further shown that for N > 450, if the 5% level of significance is used, 
and for N > 300, if the 1% level of significance is used, the value of e(ky , NV) 
is small so that for practical purposes ky can be considered as a best k. The 
authors are convinced although no rigorous proof has been given that e(ky , NV) 
is quite small for NV > 200 and is very likely to be small even for considerably 
lower values of N. 


1. Mean value and standard deviation of the statistic under alternative hy- 
potheses. It is well known that every continuous distribution can by a simple 
transformation be transformed into a rectangular distribution with range [0, 1]. 
We may therefore for convenience assume that the hypothesis to be tested is 
that of a rectangular distribution with the range [0, 1]. Moreover as mentioned 
earlier we assume that a procedure is chosen by which the class probabilities 
under the null hypothesis are equal to each other. 

The statistic whose mean value and standard deviation is to be determined is 


me : 
> 2 Oe oe ms k N 
da = x’ where 2; = W/* (a — 2). 











308 H. B. MANN AND A. WALD 


Let p; be the probability under the alternative hypothesis that one observa- 
tion will fall into the 7th class. The probability of obtaining certain specified 
values a1, a2, °°: , a is given by 


N! 

stad a1 a2 a}: 
) Pi Co ae 

Ak: 


f(ar, a2, +++ a) = - — 
G13 Geis « 


t=k 
Since >> a; = N we have 
i=l 
ixk 


a 
i=l 


N iz 


to 


» 


We consider the function 
t to th, N af / ay,ty+agtot-+- apt, 
(pie' + poe’? +--+ pre*)™ = Bf(ar, a2, °°: aes? —, 
Differentiating twice and then setting ¢; = 0 for z = 1, 2, --- k we obtain 


(2) N(N — 1)p; + Np; = E(ai), N(N — 1)pip; = E(aia;) for i ¥ j. 


Hence 

. (Sai) = NWN — 1) y pi +N, 
and 
(3) E(x") = k(N — 1) ® p+tk—-N 


To compute the standard deviation of x” we put 


. N ke 1 
Ki = (vp. z) N = VNE(», -}) 


yi = (a; — Npi) x hence | yi = Ui — Mi, E(y:) = 0. 


We have 


cen = ED (ws + -E(Lw+n)")] 


t=] 


ixk t=k t=k : 
= B(x yt 2 a Yi bi — E(x i’)) , 


Let 


then 


n= N(n- 7), 2: = a; — Np;. 








CLASS INTERVALS IN CHI SQUARE TEST 


309 
a We now assume that N is so large that the joint distribution of the z; is suffi- 
ie 


ciently well approximated by a multivariate normal distribution. Then 
E(ziz) = 0, E(zi) = 3[E(2)P, Eliz) = E(z)E(z}) + 2[E(z2)f for i ¥ j. 
We have the well known relations 
E(z;) = E(ai) — N*pi = Npi(1 — pi), 
E(2zj) = E(aia;) — N’pip; = —Npop;. 
Using the above equations we obtain 


a Ik ( i=k , 2 i=k 7 2 ink *\ 
a, = ai (= :) _ (z > #) + 4E (z can) | ; 


f 


i=k - . . t=k 2 
= N*S3 2D pil — pi + 2 [pip — pi) — pi) + 2p%p5] — | pl — p» | \ 


\ t=] J 
- on" | pil — pi + Qo pi P| 
i=l ixj 
i=k i=k i=k 2 
= an'[ Xi -2 Dot + (Lat) | 
i=l] t=] i=l 
Further 


ixk 2 t=k 
E(x zn) = E( ‘ist) +8(X a) 
ont i=l im) 


1\° 1 1 
24 pil — po( t) Xu Di Dj (», - t)(». - t) | 
i=k 1 2 i=k F 1 2 
E n(n 1) -[Sm(n-) 
t=1 k i=l k 


. 
* 
3 


I 
< 
w 


l 
2 


Substituting this into the formula for o%,-2 we finally obtain 


. te 5 i=k i=k P 2) 
(4) ox/2 = 2k’) De pi + AN — 1) Do pi — (2N — W(X r') >. 
oo = ‘= ) 


2. The Taylor expansion of the power. Let C be determined so that the 


t=k 
probability under the null hypothesis that >> 2; > C is equal to the size Xo of 


i=l 








310 H. B. MANN AND A. WALD 


i=k 

the critical region. Let P (S #,2¢ ' be the probability under the alternative 
t=] 

hypothesis that . x; > C. Then the power P is given by 


t=] 








(5) 


where 









and (5) can be written in the form 


(6) P (= a; > c’) 


where C’ is a certain function of N and k. Let 6; = p; — = where p; is the 
probability of the ith class interval under the alternative hypothesis. 

Expanding P into a power series we obtain (in this and the following deriva- 
tions, we take all partial differential quotients at the point 6:5 = & = --- = 
Op. = 0) 












+2 +3{¥ pat + Day Sele... 


05; 05; 






Since P is a symmetric function of the 6; we have for 6; = & = --- = 6=0 












aP @P aP aP gas 
—{=-3, come CR enna fort ¥ /. 
06, «06s 06;95; 95; db. 


t=k 


Furthermore >, 6; = 0. Therefore 


t=] 


P=n+= te aE + oF ba} + 


06 1 i= 06; 062 ixj 






We shall first show that the terms of second order are always positive. This 
shows that the test is unbiased and justifies again the choice of equal class 
probabilities under the null hypothesis since this assures unbiasedness and mini- 








tive 


the 


Va- 


0 


his 
ASS 
ni- 


CLASS INTERVALS IN CHI SQUARE TEST 311 


mizes among all unbiased tests the g.l.b. of the distances of such alternatives 
whose power is equal to the size of the critical region. 
The power is given by 


N! a ao a 
P= p Me agen SE s+ pe. 
a2+az+---az>c’ H1+ 2+ °° Ak: 
Since Z 6; = -> 5,6; we obtain for the second order terms 
i=1 ij 
erPoct . oP a P = . 
2 8 + Fpu-(F yrs 
(7) 06) i=l + 062 ij a6; Ody 05> i=1 
é 


i=k 
2 2 
os (aj — ai — a a2) par , Q2°** a) > 0; 

att+ait---+ai>c’ i=1 


where ” 
N! 1 
ay! ae! eee a;,! kN 


P(a,-** ax) = 


In the following derivation extend all sums if not otherwise stated over all 


i=k i=k 
° 2 . ry 
terms for which >> a; > C’ and use the relation >, a; = N. We have because 
i=1 i=l 


of the symmetry 


N N 
Z on p(ar, a2, °°° ax) a zu pla, = = ** ax) a Ee 


1 2 oS 

z a a2 pla , adi ax) = k(k — 1) by (W" _ a e’) pla, G2, °*** Ak) 
ai cana 

k(k-—1) k-1 


Hence the coefficient of the second order term becomes 


Z. ai p(ar, mam, °** Qk). 





k ‘ N N’ 
” : inn cy i lp ee le lec a 
pee Fe Ba” 
N N’ 
" ny gil wine eee ink See eeraenereee 
a pla, a ar) k 0 k(k — 1) 0 
But 
i=k 
b> za a pla, =, ai.) i=k 
nl, (= oi) 
Ao i=1 
ae 
since the conditional mean for values of >> a; > C’ must be larger than the 


i=l 
























312 H. B. MANN AND A. WALD 


i=k 


t=k r2 r 
, i ; 2 N N . : 
mean of all values of >> az. Since £ (= o!) = — — + N, we obtain 


i=1 i=l 


i=k 


1 2 
k , 2s De ai plan, a2 +++ an) 
= i=l 


No (x N(k — ») ( N° N 
> initia - — ssintiegiecacntceata = » . ome 
Rt & Nik — tt F 
i=k 
and hence the coefficient of >. 4; is larger than 0. 
i=1 
To prove Theorem 1, we will have to determine the alternative distribution 
i=k 
for which >> 6: becomes a minimum subject to the condition that the distance 
i=l 
from the null hypothesis should be greater than or equal to a given A. 
Hence we have to find a distribution function F(x) such that | F(x) — x| > A 
i=k i=k 2 i=k 
i ] 2 ‘ goons 
for at least one value x and Z 5; = z. @ —--)} =) x - ; is a minimum 


v 
i=1 i=1 k i=l k 
i=k 


1 fi~—1 ee » een 
where p; = F() — r( =). Instead of minimizing 6; we may minimize 
0 y i=1 
i=/; 
> pi , since the two expressions differ merely by a constant. There will be two 


1=1 

different solutions for F(x) depending on whether F(x) — x 2 AorF(x)-awe¢ 
—A for at least one value x. Because of symmetry we restrict ourselves to the 

case in which F(x) — x 2 A for at least one value of z. 

Let a be a value for which F(a) — a > A and suppose that 

1-1 l 
—<age 
k “~k 


then 


We prove first 





F () = F(a)+F () — F(a) >a+A 


Proor: Since F () — F(a) > O we have 











ition 


ance 


A 


W 


num 
mize 


two 


rEg 
» the 


CLASS INTERVALS IN CHI SQUARE TEST 313 


and 
l ly l_l— l 1 
a«Picia 2 -;>-—* we oe ae E. 
€ F(t) i Z>at+aA E 2 +A k A i 
IfA< : we can always find a distribution function in C(A) for which p; = -. 
i=k 
Hence we consider o the case k > 7 We must minimize )> pi under the 
t=1 
i=l i=k 
condition .e pi = + €, x ~* as = e. We therefore minimize 
i=l 
i=k i=k 
= 2 pi ati 2». P, — 2he 2», Pi. 
This leads to 
1 € ; 
E + i for «1 =1, l 
Pi = ; 
€ 
roe: 
‘se for (1+ 1) k 


We then have 
= 4] 1 € , 1 ek 
2, Pi = (i+ +(-v(E- 4) “—r ce 


This is smallest if « = A — rand 1 = =. The following discontinuous distribu- 


tion function gives these values for e, / and p; and has the distance A from the 
rectangular distribution. 





1 1 1 
= A = — < Co is a 
F(x) Tea ) | for O<7r€5 i 
1 1 1 1 
= : a iaephiene i 
F(r) = ; ~. for 5 R<7 5° 
©) KH) = [1 — 2(a - i) |+2( -}) for =< 2<1, 
F(x) = 0 for 0 < 2, 
F(x) = 1 for x21. 


3. Solution for large N. Denote by F(A, k) the distribution function (8) of 


C(A) which makes )_ 6; a minimum if the test is made with k class intervals. 
i=1 


Assume that k is large enough that x” can be taken as normally distributed. 
The power of the test is then given by 






















314 H. B. MANN AND A. WALD 


1 isk ie . 2 
1 = ~ Qe" 2, —s Z, *i i= ° 
[ ‘ : d (= ti) 
( 


\/On a’ JK oa 
(9) V 29a J(k—-lV+er/2(k-1) 


1 —172 
mating on ew dy, 
V Qe S18 2 40/2 k— 
wide 2 z; +c4/2( 1) 
=1 
SF 
i=k 


. . . 9 . . 
’ is the standard deviation of >> 2%: and ¢ is determined so that 


where o 


1 —* 
= e ~” dy is equal to the size of the critical region. Hence to maximize 
Ww Je 


i=1 


the power with respect to k is equivalent to maximizing 
i=k 
E (Dat) - @- 1) -eVaE=7) 
a 


/ 
oC 


with respect to k. 
Under the alternative F(A, k) we obtain 


9 


- i=k 
B(X xt) -@-D=MN-1) Lote e—-N—k+1= 4-0 (a-1) 
i=1 


i=1 


Hence 


vik) = av = 1)(a = 2) = evae= 


We choose A so that this maximum power is exactly 3, that is, so that y(k) = 0 
for that k which maximizes y(k). Denote this value of A by Ay and let ky be 
the value of k which maximizes ¥(k). The differential-quotient of the nume- 


rator of ¥(k) with respect to & is then equal to 0 for k = ky. Hence 
1 1 Cc 
(10) 8(N — 1) (a, —_ 2s ——_ i eae « 
*  kw/ kB /2(ky — 1D 


Furthermore since ¥(ky) = 0 we have 


(11) 4(N — (4s ~ t) = cv/2(ky — 1). | 
N 
Solving equations (10) and (11) we obtain | 


(12) a ee 


5 keg 4/7 — 1)? 
(ky al 1)° Cc 









and 









at 


ize 


be 
ne- 





CLASS INTERVALS IN CHI SQUARE TEST 315 


or since ky > 3, 


ke <4 4/2 thet. 


Hence 


(13) either ky = E y/ uN 0) — [ ‘ fee : mF] Le 


is the value of k for which the power with respect to F(Ay , k) becomes a maxi- 
mum. We have merely to show that y’’(k) is negative for k = ky . 
Using the fact that Y(ky) = W/(kx) = O we obtain 


sien —16(N — 1) 24(N — 1) c 
o ky — Ayn ———_—_——"——.. 
ro kx . ke (V2(ky = 1))° 


Substituting for Ay the right hand side of (12) we obtain on account of (10) 


—56(N — 1) , 64(N — 1) enya a *) 
Rm 7 Bm 2k DNB) 


Using 2(k — 1) > k we obtain 


op" (ky) = 3 
N 


1 , -_ 
Y"(kw) < 2. (-24w - 1) + ww -— 
kg k 

which is negative. o’ can be shown to be of order ki ; ¥/’(ky) is, therefore, of 

N 1 om ‘ 
order iat = 0( i): The maximum is, therefore, rather flat for large 

CN 2 l 
values of N. 

We shall now show that if k& is large enough to assume x” to be normally 

distributed then F(A, k) is the alternative which gives the smallest power com- 
pared with all alternatives in the class C(A) provided the power for the alter- 


native F(A, k) equals 3. 
k 


We know that B(S :)is smallest for F(A, k). Since the power with respect 


i=1 


to F(A, k) equals 4 we have 


B(x a) —(k-—1— evV/2(k — 1) = 0. 


Thus the lower limit of the integral in (9) becomes negative for every other 
alternative and the power will be larger than 3 

The power with respect to F(Ay , kx) is equal to 3, hence if we choose k = ky 
the power of the test will be > 3 for all alternatives in the class C(Ay). On the 
other hand if we choose k # ky then there will be at least one alternative in 


3Cantelli’s baie and its proof are given by Fréchet in his book Recherches Théoriques 
Modernes sur la Théorie de Probabilités, Paris (1937), pp. 123-126. 








316 H. B. MANN AND A. WALD 


C(Ay) for which the power is <3. (For instance F(Ay , k) is such an alter- 
native.) 

The above statements have been derived under the assumption that x” is 
normally distributed. Hence if the distribution of x” were exactly normal 


ky = 4 7X 1) would be a best & and for this ky and Ay = = - = the 
oN Cy 


greatest lower bound of the power in the class C(Ay) would be exactly 4. Since 
the distribution of x” approaches the normal distribution with k > » the 
sequence {ky} is best in the limit and Theorem | stated in the introduction 
is proved. 

For the purposes of practical applications, it is not enough to know that 
{ky} is best in the limit. We have to know for what values of N ky can be 
considered practically as a best k, i.e. for what values of NV the quantity e(ky , N) 
defined in the introduction is sufficiently small. The quantity e(ky , N) is cer- 
tainly small if for the number of class intervals ky the distribution of x” is near 
to normal and if the power with respect to at least one alternative of the class 
C(Ay) is smaller than 3 also in the case when the number of class intervals is too 
small to assume a normal distribution for x”. 

We shall in the following assume that for k > 13 the normal distribution is a 
sufficiently good approximation. Actually we need not assume a normal distri- 
bution but only that the probability is close to 4 that the statistic will exceed 
its mean value. 

Cantelli® gave the following formula. Let M, be the rth moment of a distri- 
bution about 2). Let dbe any arbitrary positive number. Let P(| 2 — 29| < d) 
be the probability that | x — a| < d then the following inequalities hold: 





If M, < Mz, then P(jxz—xm|<d21- = 
d’ d?" d’ 
M, ~ M, | | M2, — M? 

5, wee te Aisewani eh >i«~+ 
dt ~ dr (| " “i (d’ — M,) + M2, — M; 


Since x” can only take positive values we have 
E(x") ¢ oye + [EQT 
_ 9 


(14) If ie Mees OO. 
Ck Ci. - 
yp E&") > a 2+ LEG 
Ci 
(15) . 
jm Pei aei-~- —_ 2 


(ce — E(x") + 6x2 
Where c; is determined so that P(x” > cx) equals the size of the critical region 
if the null hypothesis is true and the number of class intervals equals k. c, can 
be obtained from a table of the chi-square distribution. 


For F(Ay , k) we obtain with Ay = Te ie = ~ : from (3) and (4) 


ky ki k 

















n 





CLASS INTERVALS IN CHI SQUARE TEST 317 


E(x”) aie (k aie 1) - 4(N — 1)Ay, 
ox2 = 2k — 1) + SAN(k + 2N — 4) — 32(2N — 1)ay. 


By numerically calculating E(x”) and o,2 for N = 450 and a 5% level of sig- 
nificance, for N = 300 and a 1% level of significance, and fork = 13, 12 --- 
2] + 1 it can be shown that for these values of N and k 

N 
E(x") 5 oye + [EQ 
7 eo ad 7 — — = ae 


Ck Ch 


(16) 


Hence we have to use (15). From (16) it follows that ¢ > E(x”). If 
P(x” < ce < } we obtain on account of (15) and (16) 
oy? 1 9 
i —<—  -— = = ’ Pr > Cu « 
(cx — E(x”) +o, 7 2? ve > is ee 
Numerical calculation shows that for the values of N and k and the significance 
levels considered 


(17) oy + E(x") < tx. 


It can then be shown that for VN > 450 and N > 300 respectively N Ay decreases 
with N. <A simple argument then shows that (16) and (17) are also true for 
all values V > 450 and N > 300 respectively. Hence the power with respect 
to F(Ay , k) is <4 for these values of VN. Thus we see: For N 2 450 if the 5% 
level is used, and for N 2 300 if the 1% level is used, the value ky = 
14/7 = sa can be considered for practical purposes as a best k. The value 


c 


se 


1 lilt a . 
c is determined so that — _ ¢ *" dt is equal to the size of the critical region. 
V oT de 


LIMITED TYPE OF PRIMARY PROBABILITY DISTRIBUTION 
APPLIED TO ANNUAL MAXIMUM FLOOD FLOWS 


By Braprorp F. KIMBALL 
Port Washington, N.Y. 


1. Theoretical statement of problem. There is no doubt that Gumbel’s 
recent paper “The Return Period of Flood Flows” [1] has supplied an admirably 
simple technique for engineers to use in approximating the trend of return periods 
of annual maximum flood flows for purposes of extrapolation. This treatment 
is scientifically of great interest because it introduces for the first time into a 
subject already treated at considerable length by engineers, the theory of the 
probability distribution of maximum values as developed by Fisher and Tip- 
pett, von Mises, and others.’ However, certain further observations should be 
made concerning the approach used by Gumbel. 

Let x represent the measure of daily stream flow having a probability distri- 
bution w(x). Let the probability distribution of the associated annual maximum 
stream flows be denoted by V(x) with 


(1) W(x) = [ V(s) ds, 


denoting probability that annual maxima be less than or equal to x. The 
return period T(x) of an annual maximum flow of measure z is then defined by 


1 
— W(x)’ 

In this paper the probability distribution w(x) will be called the primary 
probability distribution associated with the probability distribution of maximum 
values V(x) and its cumulative distribution W(x). 

Gumbel argues that for the type of primary probability distribution that 
might reasonably be expected to apply, W(2) will be of the type introduced by 


R. A. Fisher: 


(2) T(x) = 1 


(3) W(x) = exp [—exp — a(x — u)]. 


It is further implied that a primary probability distribution involving an upper 
limit would lead to a probability distribution of maximum values of the type 


/ , k Uu ie ~(u/x)* 
(4) W (x) =—— 5 = “e 7 
U\L 


for which moments of order k or higher do not exist. The inference is then 
drawn that a primary probability distribution leading to such a cumulative 
distribution of maximum values would seem to be less likely to be the correct 


1 See references at end of Gumbel’s paper, loc. cit. 
318 








‘he 
by 


ary 
um 


hat 
by 


per 
ie 


hen 
tive 
rect 


PRIMARY PROBABILITY DISTRIBUTION 319 













one than one leading to the distribution (3). To this argument we do not 
object; but we question the implied conclusion that hence the use of a limited 
type of primary distribution is to be disallowed. 

If the primary probability distribution be of the limited Galton type 


(5) 


where K is a constant and 
(6) 


it can be shown that the limiting form of the cumulative distribution of maxima 
of n values takes the same type form (3) where z is replaced by u. This can be 
seen by observing that the transformed variate u becomes infinite as x approaches 
a, and hence has infinite range to the right, which places (5) in the category of 
distributions which are known to lead to cumulative distribution of maximum 
values of form (3). More explicitly, considering w(x) as a finite distribution in 
zx, if one traces the reasoning as set forth in von Mises’ derivation [2] of the limit- 
ing distribution (3), one finds that since the cumulative primary probability 


w(x) = K exp (—3w), 





u = k[b — log (a — 2)], Oz 


IIA 


a, 



















z 
| w(s) ds does not have a non-vanishing derivative of finite order at x = a, 
0 


that what von Mises terms the case of a limited distribution does not apply, while 
the argument for a cumulative distribution of maxima of form (3) does carry 
through, in spite of the fact that x has limited range to the right. This fact 
was not mentioned by Gumbel. 

One is thus led to the conclusion that there is no logical exclusion of the 
assumption of a primary probability distribution of the form (5). 

One might well argue for a first approximation of the actual primary proba- 
bility distribution of stream flows—using any regular time interval such as a 
day or an hour—of the form (5). Differentiating uw with respect to 2, one 
obtains 


(7) 
which means that to a constant probability increment A u there corresponds a 
maximum increment Ax in measure of stream flow equal to (a/k)Au when z 
is at the lower limit zero. This corresponding increment in stream flow decreases 


linearly to zero as x approaches its upper bound a, imposed because of the 
existence of a finite watershed. 









k dx = (a — 2x) du, 







2. Technique of fitting probability distribution of maximum values in case 
primary probability distribution is of the limited type (5)-(6).. Write the cumu- 
lative maximum distribution (3) in the form: 


W(x) = 






exp (—exp —y), y = a(u(x) — uw), 


G= 27 = a. 


(8) 
u(x) = k{b — log (a — z)), 


320 BRADFORD F. KIMBALL 


Now it is known that for the distribution 
(9) dW=e°'e" dy, 
the mean value and standard deviation of y are given by 


y = .577215 (Euler’s constant C) 
(10) m ; 
o(y) = 7/6. 
Hence 
gj = ala(x) — w] = akl(b — w/k) — LT] = C, 


where L denotes the mean value of log (a — x), with x representing the observed 
maximum flood flows. Also 


o(y) = ak o(L) = 2/6 


where o(L) denotes the standard deviation of log (a — x). Hence 


(11) ak = (r/+/6)/o(L), b — w/k = C/ak + L, 


and y is determined as a function of x by the relation 
(12) y = ak[(b — w/k) — log (a — x)}. 


It is interesting to observe that it has not been necessary to determine the 
constants k and b of the primary probability distribution. Only the upper 
bound a and observed flood flows are used in this process. From the relation 
(12) the theoretical curve in terms of x may easily be computed from tables 
relating y to W (See Gumbel, loc. cit., Table I], page 173). 

The difficulty of determining what the upper bound a should be in a specific 
case is a practical one and does not concern the objective theoretical problem 
of choosing the type of curve which most nearly describes the behavior of annual 
maximum flood flows. The point to be made in this paper is that the use of 
what seems to be a reasonable value of a, will materially alter forecasts of future 
annual flood flows relative to forecasts made on the assumption that such an 
upper limit may be neglected. It is also ventured that the resulting theoretical 
probability distribution of maxima will in general give a better fit to the series 
of observed floods than one based on the latter premise. Techniques for de- 
termination of upper bound a will not be discussed in this paper. 


3. Examples. In order to demonstrate the point in question the two methods 
have been applied to a 57 year record of the annual flood flows of the Tennessee 
River at Chattanooga for the years 1875 to 1931.” 


2 The author has already used this series in a previous article [3] and for this reason has 
found it convenient to use it here. 










Rm - © 


PRIMARY PROBABILITY DISTRIBUTION 








TABLE I 
Series of observed annual flood flows 


(Tennessee River at Chattanooga, 1875-1931) 







(1) (2) (3) (4) 
Observed Flood Ratio to Per cent of Return Period, 
x Mean Time T(z) 
85.9 .412 0.88 1.007 
108 .518 2.63 1.027 
123 .590 4.39 



















310 1.487 95.61 
349 1.674 97 .37 38.0 
361 1.731 99.12 


In Table I, col. (1) is shown the incomplete series of observed annual floods in 
units of 1,000 c.f.s. arranged in order of magnitude. The complete series may 
be referred to in Water-Supply Paper 771 entitled ‘‘Floods in the United States,” 
U. S. Geological Survey, 1936, p. 401. The mean annual maximum flood of this 
series is 208.56. The ratio of each annual maximum to the mean is shown in 
Col. (2). In Col. (4) is shown the observed return period which is taken here 
as the harmonic mean between what has been called the exceedance interval and 
the recurrence interval (see Gumbel, loc. cit., Table I, p. 167). Thinking of the 
57 vear record as a span of 57 vears, the above procedure is equivalent to taking 
the observed probability W(x) that a given annual flocd will not be exceeded 
as the mid-point of the part of this time-span covered by the observed flood in 
question. Thus the lowest flood-peak 85,900 c.f.s. corresponds to the span 
from zero to 1.754 per cent of the whole time-span, and hence W(z) is taken at 
the mid-point, —0.877 per cent. Similarly the greatest flood, 361,000 c.f.s. 
corresponds to interval from 98.246 to 100 per cent and is taken at 99.12 per 
cent. These arithmetic means correspond to harmonic means of the “recur- 
rence” and “exceedance”’ intervals referred to above. This is the procedure 
which Hazen [4] originally followed. 

Data from Cols. (1) and (4) of this table determined position of dots on Fig. 1. 
Data from Cols. (2) and (3) gave the points indicated by dots on Fig. 2, with 
1 — W(z) recorded on the chart rather than W(z). 

The two theoretical distributions fitted to these annual flood maxima will be 
referred to as distributions A and B. 

Distribution A. In this case the limited type of primary probability distri- 
bution (5) — (6) isassumed. From previous studies of this data series made by 
the author [3], an upper limit of annual floods of some 609,000 c.f.s. was found 
to be reasonable, and for purposes of this example the same upper limit will be 
assumed for the primary probability distribution. Thus the transformation 
(6) becomes: 



























u = k[b — log (609 — 2)], 





BRADFORD F. KIMBALL 























Annual 


Flood 












































+b 


Fig. 1. Comparison of methods of fitting annual flood peaks, (Tennessee River at 
Chattanooga, 1875-1931)—return periods plotted against annual flood discharges on 
semi-logarithmic chart. 


where the logarithm to base 10 can be used without loss of generality since the 
constant & will absorb the conversion factor. The mean value of the logarithm, 
and its standard deviation come to 

L = 2.59772, o(L) = .06576 
The constants of the transformation (12) are thus determined by 


ak = (r/+/6)/(.06576), b — u/k = C/(ak) + 2.59772 








PRIMARY PROBABILITY DISTRIBUTION 323 


Thus 


1/(ak) = .05127, b — w/k = 2.6273 
and solving (12) for log (609 — 2), 
(13) log (609 — x) = 2.6273 — (.05127) y 


Using a table for the known relations between y, W(x), and T(x) for the Fisher- 
Tippett distribution of maximum values similar to Table II of Gumbel’s article 















| 

| 

| | 
Probability that Annual Flood will be Exceeded 

ws 9 80 70 60 SO 46 30 20 1° os 2. 01 oF 00! .006! 














Fic. 2. Comparison of methods of fitting annual flood peaks, (Tennessee River at 
Chattanooga, 1875-1931)—Data plotted on logarithmic probability chart designed by 
Hazen, Whipple and Fuller. 


(loc. cit.) the corresponding values x of the annual floods are easily determined. 
Thus a theoretical relation between x and W(x) is set up. This is indicated as 
Curve A on the two charts exhibited here. 

Distribution B. The primary probability distribution in this case is taken 
as unlimited to the right, and in general is assumed to have the character of an 
exponentially decreasing function of the measure of stream flow x (see Gumbel, 
loc. cit.). The parameter y of the distribution of annual maxima is given 
directly by 


y = a(x — 2) 


















324 BRADFORD F. KIMBALL 






and 


1/a = (»/6/m) (stand. dev. of annual floods) = (.77970) (58.26) = 45.425 
x, = (mean annual flood) — C/a = 208.6 — (.57722) (45.425) = 182.4 
Hence 
(14) x = 182.4 — (45.425) y 


and using the table of corresponding values of y, W(x) and 7x) for the Fisher- 
Tippett distribution referred to above, a theoretical relation between x and 
W(x) is easily set up. This is plotted as Curve B on the accompanying charts. 





4. Discussion of examples. In Fig. 1 it is to be noted that if theoretical 
curves are continued to the right to give readings for a return period of 1,000 
years, the divergence of Curve A from Curve B is large enough to be of sig- 
nificance, numerically. Visual inspection does not indicate which curve is the 
better fit to the observation points. 

In Fig. 2 the curves are plotted on “logarithmic probability” graph paper. 
This paper was designed by Hazen and Fuller [4] specifically for the purpose of 
plotting annual maxima of stream-flows. A significant divergence in trend is 
to be noted at the right hand end. 

These charts indicate that the use of an upper limit may materially affect 
extrapolation of fitted theoretical curves, for purposes of estimating floods with 
a return period, say of 1,000 vears. 

If the trends of observed floods in Gumbel’s recent paper in the Transactions 
of the American Geophysical Union [5] are examined, it will be observed that 
in the case of the Connecticut, Mississippi and Rhone rivers, there is a decided 
tendency for the curve of observed floods to turn downwards, away from the 
theoretical curves, which correspond to Curve B exhibited in Figure 1. In 
the case of the Tennessee, Cumberland and Columbia rivers the tendency is 
not decisive, while in the case of the Rhine river at Basel (Switzerland) the 
tendency of the observed curve is upwards rather than downwards. As the 
writer has observed elsewhere [6], this last data series seems to be rather unique 
in character and is possibly the result of a watershed greatly influenced by 
all year around snow deposits. Possibly a radically different primary prob- 
ability distribution should be used in this case. 


5. Conclusion. The writer has demonstrated in this paper that in fitting a 
theoretical probability distribution of maximum values to annual maxima of 
stream flows, the use of an upper bound for measures of stream flow by assump- 
tion of a primary probability distribution of the type (5)—(6) 

(1) is not inconsistent with the use of the Fisher-Tippett distribution of 
maxima, 

(2) has a reasonable logical basis from the point of view of the hydrologist, 














PRIMARY PROBABILITY DISTRIBUTION 325 


(3) may materially affect the estimation of return periods when extrapolation 
is involved, relative to results obtained when no upper bound is assumed. 

It has not been within the scope of this paper to discuss techniques for de- 
termining such an upper bound, nor to apply the theory to enough data series 
to draw conclusions concerning goodness of fit. 


REFERENCES 

[1] E. J. GumBet, ‘“‘The return period of flood flows,’”? Annals of Math. Stat., Vol. 12 
(1941), pp. 163-190. 

[2] RicHARD von Misks, ‘‘La distribution de la plus grande de n valeurs,’’ Revue Mathe- 
matique de l’Union Interbalkanique, Vol. 1, Athens (1936). 

[3] Braprorp F. Kimsatu, ‘ Probability-distribution-curve for flood-control studies,”’ 
Trans. Am. Geographical Union, 1938, pp. 466-475. 

[4] ALLEN Hazen, Flood-flows, John Wiley & Sons, New York, 1930. 

[5] E. J. GumBE., “‘ Probability-interpretation of the observed return-periods of floods,”’ 
Trans. Am. Geophysical Union, 1941, pp. 836-850. 

[6] Braprorp F. KimBa.u, Discussion of paper entitled ‘‘Statistical control curves for 
flood discharges’”’ by E. J. Gumbel, Trans. Am. Geophysical Union, 1942. 





LINEAR RESTRICTIONS ON CHI-SQUARE 


By FRANKLIN E. SATTERTHWAITE 
University of Iowa 


Chi-square is a statistic widely used in statistical analysis. It is usually of 
the form, 


I Xi 
(1) «<*> & ~ . 
« F =). 
Oo; 


where the z,’s are independent normally distributed variables drawn from popu- 
lations with respective means and standard deviations, m; and o;. In practical 
problems the independence of the z,’s is often modified by placing restrictions 
on the x,’s in order to estimate the m,’s or o,’s._ It is well known that if m such 
restrictions which are linear and homogeneous (also algebraically independent) 
are placed on the x,’s, then the resulting chi-square, (1), is distributed according 
to the chi-square distribution with n — m degrees of freedom. The purpose of 
this paper is to study the case where the restrictions are not necessarily 
homogeneous. 


1. Geometrical development. The x,’s of equation (1) may be considered 
as co-ordinates in an n-dimensional space. Equation (1) represents a sphere in 
such a space with its center at the origin and with radius, x. We should like 
to determine the distribution of x”. First, since the y,’s are independent, we 
may form their joint distribution,’ 


F(x, Rasy °°” Xn) dV Kile °%3 dx ; 
(2) = Ke**i dyr dys +++ dxn 
= Ke" dv. 


™ . i _ ‘ ~ —_ 

We may change the variable in (2) to x° if we can determine dV. Since the 
n-dimensional sphere represented by equation (1) has a volume proportional 
to x", we may write 


dV = Kd(x’)*" 
ot K(x°)?"" dx’. 
Substituting this value in the distribution (2) we obtain for the distribution of 


chi-square, 


1 


P(x!) dx? = KG¢) Te ™ dx’, 


which is the usual form of the chi-square distribution for n degrees of freedom. 


1 The letter A will be used throughout as a constant, not necessarily the same constant 
from equation to equation. 


326 





LINEAR RESTRICTIONS ON CHI-SQUARE 


We shall next restrict the values of x; by means of a condition, 
(3) QuX1 + Aiex2 + °++ + Ainxn = pi, Dai; = 1, 


where p; isaconstant. This restriction represents a hyper-plane in our n-dimen- 
sional space at a distance p; from the origin. The intersection of this hyper- 
plane with our sphere (1) is an (n — 1)-dimensional sphere with radius 


x’ = (x" — pi). 
The differential of the volume of this sphere is 
dV = K(6e — 9?) a’. 


Substituting this in the distribution (2) we obtain the distribution of chi-square 
subject to the single linear restriction, (3). Thus 


2 2 r 2 2\3(n—1)—1_—}x2 2 
F(x") dx” = K(x" — pi)" e™ dx’, 
or more conveniently, 
,_ 3 2 2 2 rf 2 2\3(n—1)—1_—4(x2— p2 2 2 
F(x? — pi) d(x? — pt) = K(x" — pir PMA"? di? — 93). 


The argument may be readily extended to include additional linear restric- 
tions of the form, 


Qxx1 + Avex2 +++ + OnXn = 25 ‘ 


( 
Am1X1 + Am2X2 + wa + AmnXn = j i 


7 . i ale 2 
For convenience we shall assume that the restrictions form an orthogonal set 
so that 


ji; Qk; = 0, ¢ st k. 


The hyper-plane represented by equation (4) is at a distance, p2 , from the origin. 
Since (4) is orthogonal to (3), it is also at a distance, p2 , from the center of the 
(n — 1)-dimensional sphere obtained on applying the first restriction. There- 
fore the intersection of this hyper-plane with the (n — 1)-dimensional sphere 
will give an (n — 2)-dimensional sphere of radius 

2 2 2\3 
x” = (x° — pi — p2)’. 
Similarly, if we consider all m restrictions, we obtain an (n — m)-dimensional 
sphere with radius 


(m) 2 
a" = (x° a Zpi)’. 


2 Any set of linear restrictions which are algebraically independent and consistent may 
be replaced by an orthogonal set. Thus if (4) were not orthogonal to (3), we could replace 
(4) by (4) — k(3) where k is determined by the condition 


Da, ; (a2; = kay;) = 0 


2 
2ajj423 = kaj; 








328 FRANKLIN E. SATTERTHWAITE 





The differential of the volume of this sphere will be 
dV = K(x? — Epi)" d(x? — Epi). 

Substituting this in (2) we see that 

(xP = x? — Die; 


is distributed as is chi-square with n — m degrees of freedom. 

















2. Alternate analytic development. It is perhaps desirable that we present 
an analytic proof of the foregoing theorem. Therefore we shall first regard the 
p;’s as variables and shall determine the joint distribution of x’ and the p,’s. 
We may then pass to the distribution of those values of x’ which correspond to 
assigned values of the p,;’s. Note that the x,’s are considered to be statistically 
independent. 

The characteristic function of the joint distribution of x° and the p,’s is 
known to be® 





g @ha-s0 
* Wa =) oe ae 
Ot, by --* bm) = a 


where 


Q = Do an ajtit; 
Seek 


2 . 
= 2. G, since 7 QnA = bij. 


Applying the Fourier transform, we obtain the joint distribution of x’ and 
the p,’s: 


Q’ 

2 € 
2 vag eS B60 8 eee a +> ill 
F(x’, pr, Pn) K | | ata? dt; ¢ 


where 








Q’ = — ity” — Dit; p; — {Bt} /2(1 — 2it)} 


Slt; + ip(1 — 2i)P 


= — it,’ — 
_ 2(1 — 2it) 2 


(1 — 2it)=p;. 





Performing the integration with respect to t: , --- tm, we have, 


—itx2 an 
F = Ke" a = (1 — 2it)' "i dt, 






and finally, 


P = K(x? aS —-* ex? | 








3 See A. T. Craig, ‘‘A certain mean value problem in statistics,’ Bull. Amer. Math. Soc., 
Vol. 42 (1936), p. 671. 









LINEAR RESTRICTIONS ON CHI-SQUARE 329 


In our problem we want the distribution of x° (or more conveniently, of x° — =p’) 
when the p,’s take on fixed values. To obtain this we substitute fixed values, 
p;s, into the joint distribution and divide by the marginal total, 


| POE bist +++ Bm) dx’ = KT [3(n — m)]2*Or” e**?i, 
This gives us the distribution function, 


1 


a ~ 1h ge ae 


[3( —_ oe eo txt—E62) 


2 —Pj ’ 


which is a chi-square distribution with n — m degrees of freedom. 


3. Application. As an example of the use of linear restrictions on chi-square 
we shall now examine the effect on the chi-square test of goodness of fit if the 
moments of a sample are not corrected for grouping errors in fitting a frequency 
curve. 

The parameters of the fitted frequency distribution, f(x), are determined from 
the equations, 


(5) N | f(a) dx = > x56;, 


where zx; is the mid-point of the j group and 6; the corresponding observed 
frequency. Next a set of expected frequencies, 


6; = | | Nf(x) dz, a; = (xj1 + 2;)/2, 


7 


is determined by taking partial areas of the fitted frequency distribution. The 
expected frequency is used to transform the actual frequency into a statistic 
with mean zero and unit variance by the equation, 


x; = (6; — 6))/6*. 


Equations (5) may now be rearranged into the form of linear restrictions on the 
x;. Thus 


(6) = xibix; = pi 
where the p; have the values, 
pr. = Dy 250; — = 256, 
N | f(a) dx — = 256; 


~ 0 in general 





330 FRANKLIN E. SATTERTHWAITE 


To make our example more specific, let us fit a normal distribution to a sample 
of 1000 items with mean zero and unit variance. Let the grouping be about 
the midpoints, 

S;: —3, —2, —], 0, l, 2, 


The expected frequencies in each group are 


6;: 6, 61, 242, 382, 242, 61, 6. 


The variance of these expected frequencies is 1.080 as contrasted with 1.000 
for the sample. The linear restrictions, (6), now take the forms, 


(7) 2.4y-3+ 7.8x~-2 + 15.6x_1 + 19.5x0 + 15.6x1 + 7.8x2+ 2.4x3 = 0 
(8) —7.2x_3 — 15.6x_2 — 15.6x_1 + 0 + 15.6x1 + 15.6x2 + 7.2x3; = 0 
(9) 21.6x_3 + 31.2x_2 + 15.6x_1 + 0 + 15.6x1 + 31.2x2 + 21.6x3; = —80. 


Because of the symmetry of the normal distribution, restriction (8) is orthogonal 
to (7) and (9). Therefore the only orthogonalization necessary is to replace 
(9) by an equivalent restriction which is orthogonal to (7). This can be done 
by ‘subtracting 1.080 times (7) from (9) which gives 


(10) 19.0x_3 + 22.8x 2 — 1.2x_1 — 21.lx0 — 1.2x1 + 22.8x2 + 19.0x3 = —80 


If these restrictions are each divided by the square root of the sum of the squares 
of the coefficients of the x;, they will be the normal orthogonal set required 
by the development. The distances of these restrictive planes from the center 
of x’-sphere are 


pi) = O, psy) = O,7 pao) = 1.7. 


Thus if we test the goodness of fit of the normal distribution to this sample by 
calculating chi-square, 


9 


yw 


2 (0; — 6;)° 
Yate zri i, 


6; 


we should subtract from x° a correction of 
lp. = 2.8 


before judging the significance. This correction adjusts for the effect of the 
grouping error on the chi-square test. 

In this example, chi-square has four degrees of freedom so that an error of 
2.8 is large enough to affect our judgment of its significance. It can be shown 
that the correction is proportional to the size of the sample. Therefore, if our 
sample had contained only 100 items, the fit obtained by ignoring grouping 
effects would be almost as good as the fit when the sample moments were cor- 
rected for grouping. On the other hand, if the sample had 10,000 items, it 





LINEAR RESTRICTIONS ON CHI-SQUARE 331 


would be practically impossible to obtain a satisfactory fit without correcting 
for grouping errors. 


4. Conclusion. The theory of the loss of degrees of freedom for chi-square 
when the underlying statistics are subject to linear restrictions does not require | 
the restrictions to be homogeneous. For restrictions which are not homogeneous, 
a correction must be subtracted from chi-square equal to the square of the 
distance from the center of the sphere, 


2 ~ 2 


to the intersection of the restrictive planes. Non-homogeneous restrictions 
sometimes arise in practice because of the bias introduced by an approximation. 
An example is given from curve fitting. 













SYSTEMS OF LINEAR EQUATIONS WITH COEFFICIENTS SUBJECT 
TO ERROR 


By A. T. Lonsetu 
Iowa State College 


1. Introduction. Various scientific problems lead to non-homogeneous sys- 
tems of n linear equations in n unknowns, in which the n* + n coefficients (in- 
cluding ‘‘absolute” terms) are subject to error. Such errors may be errors of 
observation, or errors introduced by rounding off decimal expansions. If the 
system has a non-vanishing determinant, the ordinary rules yield the solution. 
But the question arises: how may the possible errors in the coefficients affect the 
solutions? In particular, one would like to know how to exclude the fatal event 
that some malicious combination of errors might make the determinant zero. 
One would further like to have limitations on the solution-errors in terms of 
maximum coefficient-errors. Considering the coefficient-errors as random vari- 
ables, one may also inquire as to the probability distributions of the solution- 
errors. 

The principal result obtained in this paper is the Taylor’s expansion of the 
error in any unknown, considered as a function of the n(n + 1) errors in the 
coefficients. An upper bound is obtained for each term of this series, and the 
sum of these upper bounds (when convergent) is expressed in closed form. Thus 
are obtained not only approximations to the maximum error, but an actual upper 
limit. Convergence of the power series is established for sufficiently small 
coefficient-errors; “‘sufficient smallness”’ is specified in terms of a simple criterion, 
which simultaneously provides a sufficient condition for the non-vanishing of a 
determinant with elements subject to error. 

These results were obtained before I learned that work had already been done 
on the problem. The earliest seems to be that of F. R. Moulton [2] in 1913; he 
found the first order approximation (6) for n = 3, and discussed the geometrical 
reasons for sensitivity. Much later I. M. H. Etherington [1], evidently un- 
aware of Moulton’s paper, found the expression for the total error of a deter- 
minant whose elements may be in error, and applied this to the present problem. 
He thus found limits for the first and second order errors, in a rather different 
form from mine. The probabilistic considerations of section 5 were suggested 
by Etherington’s article. L. B. Fuckerman [3] recently discussed the question 
of estimating computational errors incurred in the course of solution. He con- 
sidered only errors of first order. 

My original procedure was to compute the terms of the Taylor’s series as 
successive differentials of the unknown, from Cramer’s formula. This soon be- 
comes laborious, and I found only the first two terms. The linear matrix equa- 
tion (4) was then kindly suggested to me by R. Oldenburger. Here (4) is solved 
by iteration, resulting in a simple recursion formula for successive terms of the 
Taylor’s series. 

332 


SYSTEMS OF LINEAR EQUATIONS 
2. Formal matrix solution. Let the system of equations be 
n 
(1) De aij tj = 
j=1 


In terms of the matrices 


system (1) can be written 
(2) AX = C. 


Supposing that not all c’s vanish, and that A, the determinant of A, does not 
vanish, there is a unique solution X. But the a’s and c’s, and consequently the 
x’s, are subject to error: let the true value of a;; be a;; + ai; ; of ¢:, ¢; + ¥i ; 
and of the resulting x;, 7; + &;. We must actually deal with the system 


(3) (A+ a)\(X¥ +x) =C+ Ce, 
where we have written 
Qi *** Gin 


a= ; 


| 
Qni*** Ann } 


Expanding (3) and using (2), we find for the error-matrix x 
(4) x =m-+ nX + nx, 


with m = A‘c, n = —A’‘a; A’ is the inverse of A. We solve (4) formally 
for x by iteration. Thus 


x =m-+ nX + n(m +'nX) + nx, ete.: 


and there results the infinite expansion 


(5) x= >) x”: x” =m-+4 nx; 
k=1 


In section 4 convergence of (5) will be established for sufficiently small | a;; |. 


3. The elements of x”. It is necessary to consider closely the individual 
elements of x“. Writing 








334 A. T. LONSETH 


we note from (5) that 


<< g® . 


k=1 


this is precisely the Taylor’s series for the error in x; : each £5" is a homogeneous 
polynomial of degree k in the a’s and y’s. Writing A;; for the cofactor of a;; 
in A, 


x” =m-+nX = A ‘(c — aX) 


Au ‘i = 


| 

A | 
Hy 
“abl 
Ant 


— Ani T1 — ee 


whence (summing hereafter from 1 to n on Greek-letter subscripts) 


1 
(6) Ey» = A {> YeAgy — 21 z GQahy ~ +++ — Be a Qun Ay;} . 
d B Ba H 


From (5), if k > 1, 


so that 
- (k) __ 1 (k—-1) 
(7) i = ~ 7¢ De Mw Ay; 
a v rv 
The sums Ly,A,;, 2a,:A,; have obvious interpretations as determinants. 


4 Bounds and convergence of the series. Assuming | a;;|, |yi| < 6 and 
taking absolute values in (6), 


(8) ie | 1+ 2 | a |)Q0 |. 1,4; |). 





SYSTEMS OF LINEAR EQUATIONS 335 


It will be observed that equality can be attained for a particular choice of a’s 

and y’s as +6: the bound for first-order errors is best possible. But it is not in 

general possible by a single choice of a’s and y’s to obtain equality for all j. 
Similarly from (7) 


|e | < a2 je? DOL | Au), k>1; 


whence by induction 


® Pls (SY a4 Died D led | Awd. 


Summing on k, 


dle 1s + Dla D Awd (Se , 


k=1 


— é J 
fe rae ray 20 2 | Awl: 


If p < 1, we can let m—> «: 


(10) rary ; 1+ > Jay |CO | Ay |)/G — 0). 
| & Mu 


|< 


Observing that the y’s occur linearly in (6) and (7), we conclude that (5) con- 
verges if 


(11) Joi] £5 <|AI/( DD | Aw)). 


It follows that the determinant of the system (3) cannot vanish if (11) holds. 
This is rather remarkable, in that 62= | A,, | is merely the maximum first-order 
term in the error of that determinant ({1], p. 108); the effect of higher order 
terms (i.e., of any but first-order minors) in producing a zero determinant can 
be wholly ignored. 

From the remark after (8), it appears that equality in (9) and (10) cannot 
generally be attained. 

If (10) is written | ~;| < B/(1 — p), it is easily seen that the remainder after 
the hth approximation does not exceed p’B/(1 — p). 


5. Probability distributions. We now consider some consequences of the 
following assumptions: the a’s and y’s are identical, independent random vari- 
ables, bounded by a 6 satisfying (11), and distributed symmetrically about zero. 
(It would be reasonable to assume further that they possess a frequency func- 
tion, which is nowhere concave upward.) Writing &(x) for “expectation of the 
random variable x,’’ we have 


&(aj;) i &(y;) = 0, S(aij) = &(yi) = o < a. 








336 A. T. LONSETH 


On account of independence and symmetry, the expectation of any power- 
product of a’s and y’s containing an odd power must be zero. To first order, 
the mean a; of the solution-error £; is oe by 


(12) a‘) = 5(€9) = 0; 


and the standard deviation S; by 


(13) Si) = VS{EPY} 







r 4 (1 + de t(D Aji}? 


The second approximation to a; is also easily obtained: 


(14) aS = &(¢?) = = | << x fh ign Ag) 


Both (13) and (14) were given by Etherington [1], though in a less symmetric 
form. Higher approximations, as he remarks, involve complicated summations; 
but if they should ever be required, the machinery exists in (6) and (7) for their 
systematic computation. As to the errors in using (13) for the standard devia- 
tion S; and (14) for the mean, we know only that 


a;= a; +0(8'), 8S; = (S}")* + o(8'). 


‘therington ({1], p. 111) considers the important special case of ‘rounding 
off”? decimal expressions. Each a and c is supposed correct in the qth decimal 
place, the (¢ + 1)th figure being ‘‘forced,” i.e., increased by one when the 
(q + 2)th figure is dropped, if the (q + 2)th is 5, 6,7,8,or9. Assuming constant 
frequency 10 ‘ in the interval (— 310%, 310“), we may use (13) and (14) with 
o = 10/12. 

Errors of observation are often assumed to be normally distributed. There is 
nothing against such an assumption with regard to the y’s, but the a’s must not 
make (3) singular, and must accordingly be suitably bounded, e.g. by (11). 
























6. Conclusion. The formulas and bounds of this paper involve only these 
quantities: the determinant A, its first order minors, and the solutions of (1). 
They can be found in the course of solving (1) by orthodox methods. 

Inequality (10) definitely limits the maximum solution-errors, in terms of the 

maximum coefficient-error 6, provided 6 satisfies (11). But it may be that (8), 
either alone or in conjunction with. the second-order bound from (9), will give a 
better approximation. 

The ratio ZZ | A,, |/| A 
to error. 

The fundamental formulas (6) and (7) are capable of solving other problems 
than those studied here. For example, it may happen that only certain elements 
(such as those of a single column) are in error, in which case better inequalities 
can be found. Or the a’s and y’s may not be independently and identically 
distributed. 


| may be taken as a “‘measure of sensitivity” of (1) 









SYSTEMS OF LINEAR EQUATIONS 


REFERENCES 

{1] I. M. H. Ernerineron, ‘‘On errors in determinants,’ Proc. Edinburgh Math. Soc., 
Ser. 2, Vol. 3 (1932), pp. 107-117. o 

[2] F. R. Mouton, ‘‘On the solutions of linear equations having small determinants,”’ 
Amer. Math. Monthly, Vol. 20 (1913), pp. 242-249. 

[3] L. B. TuckerMaN, ‘‘On the mathematically significant figures in the solution of simul- 
taneous linear equations,’’ Annals of Math. Stat., Vol. 12 (1941), pp. 307-316. 

[4] P. G. Hogt, ‘The errors involved in evaluating correlation d@terminants,’’ Annals of 
Math. Stat., Vol. 11 (1940), pp. 58-65. 





ON MUTUALLY FAVORABLE EVENTS 


By Kar-Lat CHUNG 


Tsing Hua University, Kunming, China 


Introduction. Foy a set of arbitrary events, E. J. Gumbel, M. Fréchet and the 
author’ have recently obtained inequalities between sums of certain proba- 
bility functions. One of the results of the author is the following: 

Let E,,---,H, be n arbitrary events and let p»(v,--- ,v,-) denote the 
probability of the occurrence of at least m events out of the k events 
E,,, -°:, E,. Then, fork = 1,---,n — landl < m S k we have 


ae see = — y see 
E = _) =Pm(v1 5 7 Vist) — (, —" 7) Pm ’ Vi), 


where the summations extend respectively to all combinations of k + 1 and k 
indices out of the n indices 1, --- , n. 

In course of proof of the above inequalities it appears that similar inequalities 
between products instead of sums can be obtained under certain assumptions 
regarding the nature of interdependence of the events. We shall first study the 
nature of such assumptions, and then proceed to the proof of the said inequalities 
(Theorems 1 and 2). It may be noted that the inductive method used here 
serves equally well for the proof of the inequalities cited above, though some- 
what longer, but apparently our former method is not applicable here. 

That events satisfying our assumptions actually exist, is shown by an appli- 
cation to the elementary theory of numbers. The author feels incompetent to 
discuss other possible fields of application. 


1. Let a set of events be given 
cm. --+,H,, «- 


and let E; denote the event non-E;. Let p(i) denote the probability of the 
occurrence of E;, p(i’) that of the occurrence of E;. For convenience we 
assume that for any 7 p,(1 — p,;) # 0; events with the exceptional probabilities 
0 or 1 may evidently be left out of account. 

Let p( --- v%) denote the probability of the occurrence of the conjunction 
E,, --- E,, and let p(u: +--+ wr, v1 -++ vx) denote the probability of the occur- 
rence of E,, --- E,, , on the hypothesis that F,, --- E,, have occurred. The 
u’s or v’s may be accented. 

DEFINITION 1: If p(m, v2) > pr), we say that the occurrence of the event E,, 
is favorable to the occurrence of the event E,, , or simply that E,, 1s favorable to E,, . 


1 “On the probability of the occurrence of at least m events among n arbitrary events,”’ 
Annals of Math. Stat. Vol. 12 (1941), pp. 328-338. 


338 





MUTUALLY FAVORABLE EVENTS 339 


If p(v1, v2) = p(v2), we say that E,, is indifferent to E,,. If p(vi, ) < p(v), 
we say that E,, is unfavorable to E,, . 
Thus the relations “favorableness,”’ “indifference,” and “unfavorableness”’ are 


mutually exclusive and together exhaustive. We state the following immediate 
consequences: 


(i) Reflexity: An event is favorable to itself; in fact, p(v, v) = 1 > p(y). 
(ii) Symmetry: If EZ, is favorable (indifferent, unfavorable) to E,, then E, 
is favorable (indifferent, unfavorable) to E,. In fact, we have 
p(l)p(1, 2) = p(12) = p(2)p(2, 1), 
pl, 2) _ p(2, 1) 
p(2) p(1) 
Thus p(1, 2) 2 p(2) is equivalent to p(2, 1) 3 p(1). 
In particular, if EZ; is indifferent to E,, then so is E, to E,. They are then 
usually said to be independent of each other. 
(iii) If E, is favorable (indifferent, unfavorable) to E2 , then E; is unfavorable 
(indifferent, favorable) to E,. For, we have 
p(1)p(1, 2) + p(l’)p(1’, 2) = p(12) + p(1’2) = p(2), 
whence 
p(l’)p(1’, 2) = p(2) — p(1)p(1, 2). 
On the other hand, 


p(l’)p(2) = [1 — p(1)]p(2) = p(2) — p(1)p(2). 
Since by assumption p(1’)p(2) # 0, we have 
p(1’, 2) _ p(2) — p(l)p(1, 2) 
p(2) p(2) — p(1)p(2) © 
Thus 


p(1’, 2) = p(2) according as p(1, 2) = p(2). 


For the sake of brevity we introduce the following symbolic notation: 


1, if EZ, is favorable to E, 
0, if E, is indifferent to E, 
\—1, if E, is unfavorable to E, . 


E/E, = 


Then by (ii) and (iii) we have 
E/E, = E2/E,, 
E;/E, = E2/E; = E,/E, = E:/E, = —(E,/E:), 
E;/E, = E;/E, = E,/Es, 


analogous to the rules of signs in the multiplication of integers. 








340 KAI-LAI CHUNG 





(iv) Non-transitivity: If E, is favorable to E;, and F, is favorable to E;, 
it does not necessarily follow that E, is favorable to E3 ; in fact, it may happen 
that E, is unfavorable to E;. For instance, imagine 11 identical balls in a bag 
marked respectively with the numbers 


—1i, +10, —3, —2, —1, 2, 4,6, 11, 12, 16. 






Let a ball be drawn at random. 


Ey 


Let 












= (the event of the number on the ball being positive) 





E, = (the event of the number on the ball being even) 
E; = (the event of the number on the ball being of 1 digit) 
We have 





(v) It may happen that E£,/E; = 1, E./E; = 1, but H,E,/E; = —1. In the 
example above, 


p(2,1) = § > % = pil), 

p(3’, 1) =} > *& = pL), 

p(23’, 1) = 3 < = pli). 

(vi) It may happen that #,/E, = 1, E,/E; = 1, but E,/E,E; = —1. Ex- 
ample: 

p(l,2) = § > %& = p2), 

p(l, 3’) = 2 > vr = D3’), 

p(1, 23’) = § < % = p(23’). 


(vii) It may happen that #,/E; = 1, E./E; = 1, but the disjunction 
(E, + E.)/E; = —1. For, by (v) we know that there exist events ..%.& 
such that 










Ei\/E;=1, E,/E;=1, E,E;/E; = —1. 
Hence by (iii) there exist events E; , Ez , E3 such that 
E,/E;=1, E:/E;=1, (E,E2)'/E; = —1. 


But (E;E;)’ = E, + E,. Thus the last relation is (FE; + E.)/E; = —1. 
(viii) It may happen that #,/E, = 1, E,/E; = 1, but E,/(E, + £3) = —1. 
This follows from (vi) as (vii) follows from (v). 
After all these negative results in (iv)—(viii), we see that we cannot expect to 
go far without making stronger assumptions regarding the nature of inter- 











MUTUALLY FAVORABLE EVENTS 341 


dependence between the events in the set. Firstly, in view of (iv), we shall 
restrict Ourselves to consideration of a set of events in which each event is 
favorable to every other. Secondly, in view of (v), we shall only consider the 
case where the “favorableness,”’ as defined above, shall be cumulative in its 
effect, that is to say, the more events favorable to a given event have been 
known to occur, the more probable this given event shall be esteemed. We 
formulate these two conditions in mathematical terms, as follows: 

DEFINITION 2: A set of events E,,---,E,, +--+ ts said to be strongly mutually 
favorable (in the first sense) if, for every integer h and every set of distinct indices 
(positive integers) wi, --- , wn and v we have 


P(mi-** wn, Y) > plur-*+ Mar, v). 


This definition requires that there exist no implication relation between any 
event and any conjunction of events in the set; in particular, that the events 
are all distinct. It would be more convenient to consider the relation “favor- 
able or indifferent to.’”’ This will be done later on. The present definitions 
have the advantage of being logically clear cut and also that of yielding unam- 
biguous inequalities. 

From Definition 2 we deduce the following consequences: 

(1) If the set (ur ,---, us) is a sub-set of (u,--- , ua), we have 


P(r -+* wn, ¥) > plur «++; , »). 
(2) For any positive integer k and any two sets (1, --- , vx) and (wu, --+ , ua) 
where all the indices are distinct, we have 
Dlr =** way r++ ve) > plan =+* waaay M+ M4). 
More generally, we have as in (1), 
Pur ++ May Mts ME) > Pur os Mey Mt MK) 


Proor: We have only to prove the first inequality. For k = 1 this is the 
assumption in Definition 2. Suppose that the inequality holds for k — 1, we 
shall prove that it holds for k, too: 


ue ‘a * Mh ‘ V1 ee . Vi) - p( m1 e¢ + wr) Pur eee pn) (ur als Mh, V1 * . . 
P(ur +++ Urry M1 +++ MR) Pui +++ Mr)P(or +++ Ma-1)P(m +++ Mri, Me 


_ Plu 2+ mica) p(aa + wars +H) 
P(ui +++ Ma)P(ur +++ Maa +++ %) 


das P(u1 a bn) Pur ciate wn) P(r e<* Bis v1) p( m1 7°? MAM, Pe * “+ Vi) 
P(ur +++ Ma) D(ur ++ Maa)P(ma +++ Ma-1y M1) D (Ma ++ Ma-1Y1y V2 +++ Ve) 





"ths Ma) Dla ++ wav, Ye +++ YE) 


° - Mh-1; V1) D( m1 eee Mh-1M1, v2 eee Vi) 


pln ++ 


P(ur sili 








342 KAI-LAI CHUNG 


Observe that none of the denominators vanish by our original assumption and 
by Definition 2. 

Therefore we see that when the failure in (v) is remedied by our definition, 
the failure in (vi) is automatically remedied too. 









2. THEOREM 1: Letn > land let E,,--- ,E,,--- bea set of strongly mutually 







favorable events (in the first sense). Then we have, fork = 1,---,n — 1, 
or a)" 
II [p(n «++ vs1)] > IL (p--- w)] 
Vie’ VE+1 me re 


where the products extend respectively to all combinations of k + land k distinct 
indices out of the indices 1, --- ,n. 


Proor. We may assume that the indices are written so that 





1S Hy < vw << ees < ope SN. 














Taking logarithms, we have 


nam T — 1 
ea Zz. log ples ++ aus) > ("7 ) ee log p(y, +++ v%). 


1° Vk = PE 


Substituting from the obvious formula 


P(r +> ve) = p(ri)p(n1, v2)p(riv2, v3) --> P(r +++ Mer, Ve), 


and writing log p(---) = q(---), the inequality becomes 


E _ ') Z[q(v1) + qi, vo) tree + q(1 oe Vers) | 
(1) ; 
> (” k ) =a) +> qin, vo.) tree +t qn oo * pa. vy) |. 


Immediately we observe that the number of terms of the form 
qi. ++: vs, u)(0 S s S w — 1) with a fixed y» after the comma in the bracket 
is the same on both sides, since 


. (f=) E')= CE )E=1): 


Let the sums of such q’s on the left and right of (1) be o” = o ?(u) 
and o” = ou) respectively. ‘To prove our theorem it is sufficient to prove 
that o\?(u) = o(u) for every » and o'?(u) > o” (uz) for at least one pz. 

Now the terms in o"” (or °’) fall into classes according to the number s of the 
u;’s before the comma in the bracket. Let those terms having s u,’s before the 
comma belong to the s-th class. It is evident that the number of terms of 
the s-th class in o”(u) is equal to 


coos 



























MUTUALLY FAVORABLE EVENTS 


- ,# — 1; where we make the convention that 


(3) - 1, (§) =0 fa <borifb <0. 


. ° 1 ‘ . 

Thus for a fixed u, when the terms in o”’(u) are classified in the above manner, 
its total number of terms may be written as the following sum, in which vanishing 
terms may occur: 


a fe ee eee 
+--+ s IG 9 
tel CTA. 


(2) 


Similarly the total number of terms in ¢ 
following sum: 


05 )G=D- C7 NG=G= 3) +6296272) 
dieses + ("7 i S 1) + sai +("5 " . 


Lemma 1: For 0 < s Sk, we have, taking account of our conventions about the 


(4) may be written as the 


binomial coefficients, 


Ge OR a ets) ao aes 
(4) (; a ') E _ )s S = ') 4 oa 1) for s S (wu — I)k/n. 
>k 


Proor: Suppose s — n+ uy, then 
n—1\(n—p\> n—-1\f n—-u 
k—1/\k-s/ < k k-—s- 


k-s 


n-u—-k+s+1’ 


according as 


i.e. according as 
(u — 1)k/n. 
But, since k < n and » S n, we have 
n—k—k/n+1>(n—k)p/n 
(zu —l)k/n >k-—-n+yu-1 








344 KAI-LAI CHUNG 


so that 
(wu —lk/n+12k—n+u. 


Therefore if s > (u — 1)k/n, then s = (u — 1)kk/n +1 2k — n + yp, and (8) 
holds. 
Again, ifk —n+ ys S (uw — 1)k/n, then (4) holds; while ifs << k — n + g, 
then the left-hand side of (4) vanishes while the right-hand side is non-negative, 
thus (4) holds for s S (u — 1)k/n. The lemma is proved. 
If we put (s = 0, 1,--- ,k) 


m—-1\(n—p\ =(n-1 a- 5 —_ 
k-—1/\k-s k k-s-1) 


then by Lemma 1, 





ds S 0 according as s z (u — 1)k/n. 






















This means that although the total number of terms of the form p(u: --- us, u) 
is the same on both sides of (1), the left-hand side is more abundant in terms 
with larger s while the right-hand side is more abundant in terms with smaller s. 
Now we have 


q(ui ++ wi, mw) > Qui ++ Hw; yw) 


if 7 > and if (ui °° ur) is a subset of (ui --- w;). Hence it is natural to suppose 
that the left-hand side must be greater because it is more abundant in terms of 
larger values. Unfortunately even if 7 > j, the last inequality is in general not 
true if the set (ur see us) is not a sub-set of (uw: --- w;). Therefore we cannot 
as yet conclude that o” = o” 

To prove that actually we have o = o”, we make the following “process of 
compensation”’: 

We have, by (2) and the definition of d, , the following equality: 


rs Van + ("7 a + pels (" 7 1) de = (0). 


where d; = O0if7 > k. Thus 








d,30 for s S$ k(u — 1)/n, 








d,=0O for s>k(u— 1)/n. 









MUTUALLY FAVORABLE EVENTS 


For the fixed y, let 


a. ') c — *) (; — 9 a 
l C m % | k Yu + _ a q(u1 5 u) a 


i 
i Hi< 


oo? ee Se 


n—1)\{(n— n— p 
( k Ke si YG + E 5) 2 Mus 1) a ae 


#(,°72 > qu 


“** <eli<e 
so that 


(1) (er) (2) 2) 
Aa-a=o (uw), pa=o (pu). 
For un» = 1,1 = 0, we have 
o(1) = a” ae a” ai e111). 


LEMMA 2: Foryu > landO S1 < uw — 1, we have 


1 
q(ui *** mi, Mw) Jus 


s Q(ur *** Mist, mM). 
ism <*-*<er<e Kh — niet lsui<°** <wl+1<u 


Proor: We have, for any v < yp, v ¥ uw; (¢ = 1,--:,D) 


Q(ui +++ wiv, w) > Qlur-++ wr, mH). 


Summing with respect to all such v’s, 


Dur +++ wiv, uw) > (uw — 1 — 1)g(mi «++ wr, »)- 


v 


Summing with respect to all 1 S wi< --- <u; < gp, 


cE > qui +> wr, w) = (1+ 1) ps Q(ui *** wisi, B) 


ism<--"<pp<p lsui<°**<ult+i<“ 


> &~- t= I) > Qui + °° my pm). 


lsui<***<ui<u 
The lemma is proved. 
Now we use induction to prove that for uy > land! = 1,---,un- 1 


in + ( : Yas + s +(*7 ‘Y a 
(1) 2). oa d cath zi 


a a a 
l 
1Sui<***<ui<e 


This inequality holds for 1 = 1 by Lemma 2. Assume that it holds for 
l, (1 << p-— 1). Then we have, by (5) and the fact that each q < 0, 


glu. *** wi, mw) 2 Oz 























346 KAI-LAI CHUNG 


(1) (2) (1) (2) 
pi+1 — pPiti1 = pi — pi + diss > q( un oo? iE’ 5 h) 
1spi<°**<ul+1<u 


iy + (" . ai + os +("7 ‘a 
—— a _ Dd ur s+ me, pw) 





1 
wp-—l 
( l ) + diss Do (ui ++ * miss, p) 
b+ ("Tat + ("7 a lt. 
2 | ——_—_—_ . een eeaeenemseneein pt ae > au «° * Misr, B) 
com | eto 
(‘7’) 
dy + (47 ') a+ o+. > re ‘at (ty 1) de 
OR scecrrrenessene Lo eee nase i, — =? > g(a: Mint, mw) = 0. 
w—l 
Gad) 


Therefore, for » > 1, we have 


a(n) — 6 (nu) = por — ps > 0. 


Since n > land1 Su Sn, there existsay > 1. Hence 
D7. (2) 
Do (nu) > Do (yp) 
p=1 p=1 


which is equivalent to the inequality (1). 





3. Our next step will be to obtain a generalization of Theorem 1. Consider 
a derived event defined by a disjunction of a (finite) number of events in the 
set, as follows: 


E,, +B, +--+ +E 


We call such a disjunction a disjunction of the m-th order. 

DEFINITION 3: A set of events is said to be strongly mutually favorable in the 
second sense if for every positive integer m, the derived set of events consisting of 
all the disjunctions of the m-th order forms a strongly mutually favorable set of 
events (in the first sense). 

Let D = D(m) denote in general a disjunction of the m-th order; let 
p(D, --- D,, D) denote the probability of the occurrence of the disjunction 
D, on the hypothesis that the conjunction of the h disjunctions D; --- D;, has 
occurred. Then Definition 3 says that for any positive integer h and any set 
of distinct D’s we have 


p(Di - 


Ym ° 








-D,,D) > p(Di --+ Din, D). 





Since a disjunction of the 1st order is an event E, we see that Definition 3 
includes Definition 2. 





Pr 


MUTUALLY FAVORABLE EVEN''S 347 


Let Dn(v1, +++, ve), v1 < +++ < » denote the derived event 


II (Ey, ea E,,) 
Bir’ * "shim 

where the product (conjunction) extends to all combinations of m indices 
out of the indices »,,---,». Let Dal V; ,***, ve) denote the probability of 
the occurrence of Dn(v,--+, vz). Itisseen that pi(m,---,%) = p(rr +++ ve) 
in Our previous notation. 

We merely state Theorem 2, whose proof is analogous to that of Theorem 1 
but requires more cumbersome expressions. 


THEOREM 2: Letn > k = m 2 1 and let Ei ,---,E, bea set of mutually 
strongly favorable events in the second sense. Then we have 
n—m a 


II [pa (r1 , ees Vii) eres 


lsry<o << opeisn 


> II [pn(i, «++, mI Cie : 


Lsvpcret cops 


To give an interpretation of p,(m , ---, v), we prove the symbolic equation 
between events: 
Dm = II (E,, + pone + E,..) 


vy Sui <*** <emSrk 


i ~ (Eu, Sec = Crm» 
Vp Sei <*** <hk—mt1 Sk 
where product means conjunction and sum means disjunction. 

To prove this, we write for a general event E, E = 1 when E occurs, E = 0 
when E does not occur. Now if Cy_mii = 0, then at most k — m events among 
the k given events occur, so that there exist m events such that Ey, = 0, Ey, = 0, 

2, = 0, thus 


Be, + Ry ++ +R, OO 


Now the last disjunction is contained in D,, as a factor, therefore D,, = 0. 
Conversely, if D, = 0, at least one of its factors = 0, so that there exist m 
events, such that Z,, = 0, Ey, = 0,--- , Ey, = 0. Thus at most k — m events 


out of the k given events occur and so by definition Cy_m4i1 = 0. Q.e.d. 
From the above it immediately follows that 


* : 
Pm(V1 , oo MK) = Pr—m4i(Mi ae Vx) 
where px_m4i(v1, +++, v) is defined in the .ntroduction. Then Theorem 2 
may be written as 
d on -_ Gl _ -1 
( aad ) pe 


TI[pi_mso(v1, SIs Vix1)| > [pi mia(r ee ae vi) | 


or again as 


n—m \~! n—m\~! 


, c—m+ ) r—m 
T[wm—a (v1 “ear Vi41)] , . > T[wWm_1(r4 re vi.)] (i 











348 KAI-LAI CHUNG 


/ / eye 
where Wm—i(v1, +-+ , v%) denotes the probability of the occurrence of at most 
ae af 
m — 1 events out of the k events E,,,---, Ei, . 


Remark. If in our Definitions 2 and 3 we replace the sign “>” by the sign 
“>”, then we obtain the inequalities in Theorems 1 and 2 with the sign “>” 
replaced by “=”. The corresponding set of events thus newly defined will be 
said to be strongly mutually favorable or indifferent (in the first or second sense). 

After this modification, we can include events with the probability 1 in our 
considerations. Also, the events need no longer be distinct and there may 
now exist implication relations between events or their conjunctions. This 
modification is useful for the following application. 

4. Consider the divisibility of a random positive integer by the set of positive 
integers. To each positive integer there corresponds an event, namely the event 
that the random positive integer is divisible by it. The enumerable set of events 


E,,E.,E3,E4, “oe En, eee 


where E,, = the event of divisibility by n, with the probabilities 


1, 


nl = 
Ww! — 
| 
= 


evidently forms a set of strongly mutually favorable or indifferent events in 
the second sense. 
Again, the enumerable set of events 


yf / / / 
se Rss 


"> ree 


wi ij Sali " aia 
where EZ, = the event of non-divisibility by n, with the probabilities 
123 n—-1 
Bigs gs 89 ee 
evidently also forms a set of strongly mutually favorable or indifferent events 
in the second sense. 
Hence our Theorem 2 can be applied to both sets and in this way we obtain 
results which belong properly to the elementary theory of numbers. 
We shall content ourselves with indicating a few examples. 


Let {a.,---,@,} denote the least common multiple of the natural numbers 
a,°*:,@,. Then Theorem 1, when applied to the two sets above, gives 
respectively 

THEOREM 1.1: Let a,,---,@, be any positive integers, then we have, 


fork =1,---,n-1 


n—1\—1! 
Is<°oe <4 S02 (Gy, 5 ty Onas} 
‘et ~1 
( Il 1 . 
Lsrpcree concn (Ons 7%, Oy} 











in 


1in 


ers 


ves 


we, 


MUTUALLY FAVORABLE EVENTS 349 


THEOREM 1.2: Also we have, 


1 1 
I] (1 - + YD — 
1S71<°°°< E41 S20 


vise Sre+1 Ay, ¥1 Sui <He S KE+1 { dy, ; Cnt 


— peer t (—1)**" ees 1 nore yor” 


(4, > at ge Oones} 


IV 


1 1 
MH o0-.s 2+ es + 
lsynp<°°<ESN 


< < < 
¥1SH1S%k Ay, ¥1SH1<H2S% {Qy,> Ay. } 


ee a 


Rico 72+, dy} 





A trivial corollary of Theorem 1 is 


p(12 are n) = Pip2*** Pn. 
Correspondingly we have 


ee ee oe ee 


Ispisn UW, 1<pi<uesn iiss Gia) (a, s++ An} 


1-— 


If, we multiply by a,a2 --- a, , we get 
A(q, , a2, eR » An) = (a, _ 1) (a2 —_ 1) coe (an - 1), 


where A(a;,---,@n,) denotes the number of positive integers < ado --+ dy 
that are not divisible by any of the a; (¢ = 1, --- , ). 

This last result, which is almost obvious here, was proved by H. Rohrbach 
and H. Heilbronn independently.” See also my generalization’ (also obvious 
from the present point of view) of this result to higher dimensional sets of 
positive integers and to sets of ideals in any algebraic number field. 





2 ‘“‘Beweis einer zahlentheoretische Ungleichung,’’ Jour. fur Math., Vol. 177 (1937), 
pp. 193-196. ‘‘On an inequality in the elementary theory of numbers,’’ Proc. Camb. Phil. 
Soc., Vol. 33, (1937), pp. 207-209. 

3A generalization of an inequality in the elementary theory of numbers,’’ Jour. ftir 
Math., Vol. 183 (1941), p. 103. 












OBSERVATIONS ON ANALYSIS OF VARIANCE THEORY 


By Hiutpa GErrIRINGER! 


Bryn Mawr College 


One of the important problems of theoretical statistics is the following. Let 
t1, %2, °°: ty be the results of N observations; by means of these results we 
want to test the hypothesis that V;(x) is the distribution of the 7th chance 
variable z;. In that situation we often decide to choose a test function 
F(a, 22, *++ tw) and to determine the distribution of F under the above assump- 
tion. By means of this distribution we compute the probability of & < F Ss & 
and compare this result with the observed value of F. 

Suppose there are m groups, each of n observations on m-n chance variables 
Zu. We may test hypotheses regarding the mn distributions of the x,, in the 
way just mentioned. In analysis of variance theory we often use as test func- 
tions certain quadratic forms s;, and s; (“variance within” and “among classes”’) 
and their quotient (multiplied by m(n — 1)/(m — 1)), usually denoted by z. 
Its distribution has been investigated by R. A. Fisher [2] under the assumption 
that the chance variables are mutually independent and subject to the same 
normal law. ‘The five per cent and one per cent points of this distribution 
have been tabulated by R. A. Fisher and are used to test, whether these two 
estimates of the same magnitude are significantly different. One gets thus a 
test of significance to test whether our sample is a random sample from a homoge- 
neous normal population or not.’ If the probability of a certain z-value is too 
small we shall reject the hypothesis that the sample is a random sample from a 
homogeneous normal population”’ [5]. 

The use of Fisher’s z-test is also reeommended if we may reasonably assume 
that the theoretical distributions are approximately normal. ‘Unless some 
rather startling lack of normality is known or suspected analysis of variance may 
be used with confidence.”’ This last remark can be understood by considering 
that, as we will see in detail, some of the basic results of our theory are inde- 
pendent of the normality of the populations. It is however this assumption of 
normality which makes possible the complete and elegant solution of the problem 
of distribution obtained by R. A. Fisher. 

If it is not possible to determine the exact distribution of a test function under 
sufficiently general assumptions we may: 

a) make simple and particular assumptions concerning the populations 

b) investigate an asymptotic solution of the problem, i.e. determine the distri- 
butions of the test functions for large samples,” or 

c) study the mathematical expectations and the variances of the test functions 


1 Research under a grant in aid of the American Philosophical Society. 
2 My italics. 
3 cf. statement (a) page 355. 





ANALYSIS OF VARIANCE 351 


for small samples under appropriately general assumptions regarding the popu- 
lations (this should be done independently of concepts of estimation, unbiased 
estimate etc.). 

This last procedure provides us with tests which suffice in actual practice.‘ 

It is well known that the expectations of the two forms s3/(m — 1) and 
s,,/m(n — 1) are the same even if the populations are not normal, but equal each 
other (Bernoulli series). In addition we shall prove the theorem, familiar in 
case of the Lexis quotient [9], that under these conditions the expectation of their 
quotients equals unity (section 1, (b)). The next step consists in investigating 
certain inequalities characteristic of Lexis or Poisson series. The different 
criteria will be completed by the computation of the respective variances (Sec- 
tion 1, (c)). 

In addition to the above mentioned test functions other symmetrical test 
functions have been considered [5]. In studying these we shall again assume 
general populations. It will be seen that the Lexis as well as the Poisson series 
may be characterized by equalities (instead of inequalities) (Section 2, (a)), and 
we can generalize our theorem on the expectation of the quotient (Section 2, (b)) 
to this case. Then the variances of these test functions will be investigated. 

It seems worthwhile to omit the assumption of independence of the chance 
variables and to study different kinds of mutual dependence. These investiga- 
tions lead to interesting relations among the expectations’ (Section 2, (c)). 
They seem to be related to Fisher’s “intraclass correlation” and to supplement 
his idea. 

Most of the results of Sections 1 and 2 can be generalized to the analysis of 
covariance (section 3). 


1. Variance within and among classes. 


(a). The test functions. Let 2, (u = 1,---m; v = 1, --- n) be m-n chance 
variables and put 


m 


i< . 1 
i ote, a = = 2 ie, 


N y=1 Mm p=1 
(1) 1 m n 1 m 1 n 
ee eae 2 de te os 7 = - 7 Ah. 
MN p=1 v=1 Mm p=1 N y=1 


4 The important paper of Irwin [5] assumes normality of the populations. H. L. Rietz 
[8] computes the expectations of s2 and s? under rather general assumptions for the popula- 
tions and considers the cases of Bernoulli, Lexis, Poisson series, but does not consider tests 
of significance; nor does he consider the symmetric test functions (section 2 of this paper). 
In later papers on our subject the assumption of normal and independent populations 
recurs. Another approach [11] in the problem of analysis of variance is to use ranks instead 
of the actual values (this has been pointed out by the referee to the author, who is very 
grateful for this comment). 

5 They generalize previous results of the author. 









352 HILDA GEIRINGER 


We then introduce the three quadratic forms 
(2) # =) Vi(tq—a)l; ss=ndi(aq—a); & = DD (te —2,), 
u v u bu v 


with the respective ranks (degrees of freedom) 

















(3) r= mn — 1, tf =m-—I1, Ty = m(n — 1). 
Then we have 
(4) s=asts., Te i oe 


The m-n theoretical distributions are assumed in this section to be inde- 
pendent of each other. Let V,,(x) be the probability that z,, < x and 


(5) Cgg = / rdV (zx), oy = / (x — ay)’ dV,,(x), 
where the integrals are Stieltjes integrals; thus the V,,(x) may be e.g. general 
arithmetical or geometrical distributions.” 


Let us compute the mathematical expectation of the three test functions with 
respect to the m-n-dimensional distribution: 


Vu(an) V12(212) east V nak Sen)- 
(6) E\|F(au, ee Sound — | cee / F(an ren Sad dV (a) cen OV when). 


We have then 


2 i 1 i 


Sa2 1 2 1 9 
A 
(8) E = | mn" T a~|{ ~~ « 
s*, 1 9 1 » 
i... i= —- wd, — rs 
(9) E - | mn" * m(n — 1) (au — ay) 


From these equalities we deduce: 
1. If the m-n theoretical mean values a,, are all equal (Bernoulli series), then the 
expectations in (6), (7), (8) are equal; 7.e. 


sr . s " 7 s _w-T s, 
(10) Ey (— =) “ms (., — :) -- = — 3} 


2. If the a, are equal “by rows’ but differ from row to row (Lexis series), i.e. 
Oy = a buta, ~ a. Then 

6 V,,(z) is a monotone non-decreasing function. Hence it has at most a denumerable set 
of ordinary jump discontinuities; at such a point it is continuous to the right but not to 
the left. Moreover it possesses a finite derivative v,,(x) almost everywhere. 





et 
tO 


ANALYSIS OF VARIANCE 353 


, Sa s m mn(n — 1) 2 

ay E, Fe (= anil = Doom I) 2 (ae — a)" > 0, 
s; Se 

(12) Ey 4 - mn — | - m— | ie (a — a) > 0. 


3. If the a,, are equal “by columns” but ies from column to column (Poisson 
series), then oy, = Q 3 & = a and 


2 2 
7 Sa s ee —_ 
- - Ee ami | oats. ead 
2 2 
. Sa Sw - 2 
1 4 Se eee eee _= . 
(14) l= =a — LG se 


In the Lexis theory’ we speak of normal, supernormal or subnormal dispersion 
2 


depending on whether the observed value of a4 is equal, greater or less than 


§° 


that of - — — and we usually consider the quotient 


- 8 s 
(15) L= —~ /—.. 
m—1/ m-—1 
- ‘ 2 ° 
In analysis of variance theory we usually compare s,/(m — 1) (variance among 
rows) with s,,/m(n — 1) (variance within rows) and introduce the quotient 


om i. v= 5§)/ aes 


It follows from (4): If L 2 = 1 then v2 S 1 and conversely. We may therefore 
speak of normal or non-normal dispersion with respect either to L or to V. 

The results given by equations (10)—(14) can be expressed as follows: If the 
m-n theoretical distributions are all equal the mathematical expectation of s°/r, of 
s,/Ta and of s.,/r» are equal. In the case of a Lexis series the expectation of s%/ra 
is greater than s’/ r and greater than s*., /Tw and in the case of a Poisson series the 
opposite is true. 

We generally use these facts in order to make inferences about the unknown 
populations from the observed values of our test functions V,,(x). If e.g., the ob- 
served value of s7/r, is “significantly’” greater than that of s’/r we may assume 
that the theoretical distributions form a Lexis series. But of course such a 
significant deviation can also be explained by quite different assumptions re- 
garding the populations (see Section 2, (c)). 

(b). Mathematical expectation of the quotient of the test functions. We are going 
to prove in this section a theorem of some mathematical interest. This theorem 
is & » gone unenean of an analogous theorem in the Lexis theory [9]. 


7 The relation between these considerations and the Lexis theory will be dealt with in 
another paper. 


8’ The meaning of the word “‘significantly”’ has of course still to be explained. 











354 HILDA GEIRINGER 


We have seen (10) that the mathematical expectations, defined by (6), of the 
three test functions 


2 2 

8 s 
few. , Pao ec iees:. 
mn — 1 m— 1’ m(n — 1) 


2 
ct Sw 


are equal if the m-n populations are equal (1.e. have identical distributions). We 
will show that even in this case 


a -— 
2(S)=1, (San 


Let us put m-n = N, and let the N chance variables be arranged in a one-dimen- 
sional sequence. As S’ and S are of second degree in the x, (v = 1, 2, --- N) 
we may write 


(17) 


S’—-S=A+t > B,2z, + C,2x5 + D> Dyp tr, 
vp 


where the A, B, , C, and D,, are constants. Now form the expectation, defined 
by (6), of (S’ — S) under the assumption that the N populations are equal 
V.(z) = V(x) (v = 1---N). Denoting by a and o” the mean value and vari- 
ance of V(x) and putting 2B, = B, =C, = C, =D,, = D we find 


E(S’ — 8) =A + Ba+t+ C(oe + a’) + Do’ = 0. 


And as this equality holds for an arbitrary distribution V(x), we deduce that 
A=Bz=Cz=D=0. Let us then compute under the same assumption the 
expectation of (S’ — S)/S. Now the expectations of 1/S, x,/S, x°/S, x,,/S, 
take the place of the expectations of 1, 2,, 2,, x,t,. But these new expecta- 
tions are also independent of the index v, because of the equality of the N popula- 
tions and the symmetry of S in the N variables x; , --- zy. Hence we may write 


1 : 5 i 
E (3) = Mo, E (®) = Mis E 4) = De» E (2) = M3, 


and we find 


S S 


because A = B= C = D=0. Hence E(S’/S8) = 1. 

We may prove in the same way that E(S’”’/S) = 1. 

We have however proved (17) only under the assumption that all the N 
populations are equal, whereas (10) is true under the mere hypothesis that the 
mean values of the populations V,(x) are the same. 

(c). The variances of the test functions. The distribution of our test functions 
and of their quotients V or L have been determined and tabulated by R. A. 
Fisher under the hypothesis that the m-n chance variables are independent and 
obey the same normal Gaussian law. Consequently by means of Fisher’s distri- 


Po is 
B (SSS) = 2 (S = 1) = Aue + Bar + Cus + Dus = 0, 








ANALYSIS OF VARIANCE 355 


bution we can test only the hypothesis that the theoretical populations have 
both these properties. 

If in a statistical problem it is not possible to determine the exact distributions 

of the test functions under sufficiently general assumptions regarding the popula- 
tions, one of the following procedures is frequently used: 
a) one tries to find an asymptotic solution of the problem, i.e. to determine the 
distribution of the test functions in question for large samples. The distribution 
of the analysis of variance quotient, as n tends to infinity, has been established 
by W. G. Madow [6]. The same problem for the Lexis quotient was solved as 
early as 1873 by Helmert [4]. As m tends to infinity the limiting distribution 
is a Gaussian distribution, which follows from general theorems of v. Mises [7]. 
b) For small samples, i.e. if m and n are finite we may determine the expecta- 
tions and the variances of the test functions for appropriately general popula- 
tions and establish in this way a test of significance. 

In this section we shall compute the variances of our test functions. Let us 
first assume arbitrary but equal populations V,(z) = V(x) and denote by M; 
the 7th moment about the mean of V(z): 


i / ie ~ af WH. bat 
(18) 
a= [ zave, M.=o. 


— using a well-known 


Then we find immediately the variance of S = ——— i 
mn — 


formula for the variance of a sample variance 
if - } Tie. — a) hak 

a Wert) ow Ve ee 2 Jb Jy, — m= 3 uit. 
mn — 1) mn — 1 nn \ mm—1 '} 


If we need the analogous variance in case of different populations we let 





? = >> (y, — by where b = * (ys + +++ + y,) 
p=1 ' 
and let V,(y), (0 = 1, --- 7), be r populations, and 


s-fyav), -Le=6, 


p=1 
| (y ia B,)' dV ,(y) = uy”, (2 = i, 2, eS ge 1, 2, Peas r), ua” _ o; ° 
Then the following formula may be used: 
Var (’) = (>) a [us” — op] 
(20) 4 ‘ 
+ 4° DF us(8, — 8 + 4D o3(8, — BY + 5 Lott. 
r p=1 p=1 Tt" p<r 








356 HILDA GEIRINGER 

We may check (20) by putting the V,(y) all equal to V(y) and find 
, - 2 r= 4 

(20’) Var (¢) = a [(r — 1)us — (r — 3)o'), 


in accordance with (19). 
. ° 2 
In order to determine the variance of s; by means of these formulae we con- 


s 1 2 : Sila . 
sider — >> (a, — a)’ as a sample variance. The n distributions in the nth row 
m“s 


are Vyi(x), Vuo(x), --- Vun(x). Or, if we assume that they are all equal, simply 













V(x) = V(a,,). Let us put = fp = 2z, and V(a,,) = V’(z,,), and denote by 


W(a,) the distribution of the average of the elements in the uth row: 
W(a,) = | vee | ave) AV" (zy2) +++ AV" (Zyn31)V "(Qu — ut — *** — Sy,n-1)- 


There is such a distribution for each row, and we have to find the variance of 
>... (a, — a)’ with respect to the combination of these m distributions. In order 
to be able to apply (20’) we need the second and fourth moments of these 
distributions. We have for the mean value a’ of W(a,): 


1 
a’ = n-(mean value of V’) = n-- a, = @ 
n 






9 


and for the variance py: of W(a,): ws = —. We still need u;. By repeated use 


of the formula 


[fer = a) + (ae = a4) AV (xy) AV (a2) 
= | (x1 — a)* dV (a) + | (x2 — az)‘ dV (a2) 


+6 (m—a)* aV(x,) [ (a2 - a2)" dV (22), 


and of the fact that W(a,) is simply the distribution of the sum of n variables 
Zu» We get: 
1 (n — ;' ° 
ho She, th a Le tte - Od 
n4 2 n 
where M, and Mbp are the values introduced in (18). 
We now apply (20’) and get 






Var [2(a, — a)’] = a=! [(m — 1)u, — (m — 3)ull. 





and substituting the values of :; and u;, we find by an easy computation the 
final result: 








ANALYSIS OF VARIANCE 


(21) Var { = D(a, — a)’ = (M, — 3M3) + ates M3. 
m—1 ) mn m—1 


If we compare this last formula with (19) we see that the right side in (21) 
is of order 1/m, whereas that in (19) is of order 1/mn. Therefore, for sufficiently 
large values of n, s*/r will be “more exact” than s;/r,. In some presentations 
of the Lexis theory it is implied that the value s/r, is to be compared with the 
theoretical or exact value s’/r; we may see a certain justification for this idea in 
the result just mentioned. This may lead us also to use s’/r as an unbiased 
estimate of the unknown population variance if a,, = a@ (see (7) and (8)). 

By means of the simple formulae (19) and (21) we can now easily test whether 
the values of s’/r and s2/r, whose expectations are equal in case of equal popu- 
lations differ significantly from each other. Of course we must compute as usual 
approximate values of Ms and M, from the observations. If is comparatively 


large—as it usually is e.g. in the Lexis theory—only the term = M: will be 


significant. If the hypothetical population is Gaussian (M, = 3M3) the right 
2M; 


; hence these vari- 
mn — 1 


side of (21) reduces to — M; and that of (19) to 


; : 1 /1 ‘ 
ances are in the ratio of — / 7 as one might expect. 


2. Symmetric Test Functions. 


(a). New equalities for Lexis and Poisson series. In Section 1, starting with the 
formula s’ = s2 + s%, we used the test functions s*/r, s%/re , 8»/Tw. This implied 
a difference between rows and columns, which is often justified, e.g. in the Lexis 
theory. The following decomposition of s” is symmetric with respect to rows 
and columns. Let 


1 m 
= a, mn op ton = de, 
= 
1 m 


= — a,=-),4=4, 
m p=1 TN y= 


9 


? = TU(twy — a)’, s, = nX(a, — a)’, 2 = LU(Xyw — ay) 
= —IX(r%—-a—-a+ay, & ; = UU(ty — a) 
with the respective ranks 
r= mn — 1, ' m— 1, , = m(n — 1), 


R = (m — 1)(n — 1), n — 1, , = n(m — 1). 


(3) 


Then 


2 2 on 2 2 2 =2 
(5) s§ =s + 5 8 + & = & + 5, 








358 HILDA GEIRINGER 






and 
(6) r=7f,+% + R= re + fo = fa + Fu 

We find the expectations of these forms under the assumptions, of arbitrary 
populations V,,(x) which are independent and different from each other. We 
then specialize for Bernouilli series, Lexis and Poisson series of populations 


& je 2 ° 
respectively. Denoting by a,, and o,, the mean value and variance of V,,(x) 
and by 


1 1 1 1 
© a«a=!Fe, a=1Ee, «nl Eag=! Ea. 
nr » m n 


miu 



























we find for the expected values defined in (6) Section 1: 























































s a 1 vy? 1 sy —— 
. E - | ~ mn r mn — 1 ae aT, 
Sa is 1 yy 2 1 sy — 
E |= — ‘| — am m8 py + a= n=(ay a) ; 
=2 
Sa 1 2 1 2 
E = — 2244+ —. adie, — a’, 
(7) E _ | mn" + i Ge = a) 
7 
zl i | ~ 250 + ZZ (Ay — ay — a + a) 
ee ee ee =—_— _— v Se ee eee v — [ — Ay ’ 
(m — 1)(n — 1) mn *”" (m— 1)(n— 1) . ' 
Si a 1 vy 2 1 sy a 2 
oi = _ 5 | mn” T m(n — 1) EE — on)’, 
1 1 





— fu as. 2 ee >> _;y 

Ee _ al mn” + n(m — 1) Bale — Be). 

In the Bernouilli case which as far as the author knows is the only one which 
has been considered in this connection [5], we get the wellknown result: 


s S \. os 
” Esl — E = , “a Ee 
Si - cM 7 ’ 
“7 a - 5 i E- —| “—_ we —1)(n— mt 


Now let us assume a Lexis series, with 














(8) 




















” 
(9) Any = Ay 5 Oy ~ a; a = a, Cw = Oy 


Then (7) reduces to 











ANALYSIS OF VARIANCE 


’ Ss’ oa 
Bul her 5 7 


1 
ro, + a= i Za, = a)’. 


From these formulae we deduce—besides the inequalities (11), (12) of Section 1, 
and the corresponding formulae where the role of rows and columns is inter- 
changed—the further inequalities: 


Sa x, 
ay Bal S| al 21. 


But there are also characteristic equalities, namely: 


x —_ Ss a Si 
= | “ E Hea 5 | “= axe = mI 


These equalities’ seem often to be more appropriate than the usual inequalities 
in testing the hypothesis of a Lexis series. 

Let us finally consider the Poisson case which is very often neglected. There 
we have: 
(13) yy = a, a& ~ a, a, = a, ony =a. 
Then—beside the inequalities (13), (14) of Section 1 and the corresponding 
ones where the role of rows and columns is interchanged—we find the new 
inequality : 

s $s m 9 

14 E — = = — —— a,—a) <0, 
(14) -| <8 - +.) -i+® a) 
which of course corresponds to the Lexis inequality (11). The characteristic 
equalities are now: 


. ¥ x, 
- Ee| | “— Ee = opt ” a : me 


These equalities (12) and (15) can be used in testing the hypothesis of Lexis or 
Poisson series respectively in the same way as the equalities (9) for the Ber- 
nouilli case. We shall deal with the variances of these test functions in (d) of 
this section. 

(b). Mathematical expectations of the quotients of certain test functions. We 
have seen that in case of a Lexis-Series the expectations o 4 , of 


are equal. We will show that even in this case 


9 


Ss dof s., 
(m —1)\(n — a m(n — 1) 


® See (10) p pp. 81-90 for proofs of these inequalities for the case of normal populations. 



















HILDA GEIRINGER 


2 
Sa 
Ey i / nt _ fe 


Sp 
Ey = —1)/ (m—- ae — 1) 


e. & 
Ey Lis (m— 1)(n— a 


Ey l-—* —1)\(n— t— - | sas 


_& ’ 
= = ees ine ice 7 AS | 
a = T and | ee | T. As both 


T and T are of second eae in the xz,, we may write: 

™ 2 

T-T=A + Zz a + 2. Cwitee + Z y D yy iti us Lay si Lugs » 
MP Me 


where the A, B, C, D are constants. The last sum contains }-mn(mn — 1) 
terms and not both yu, = ue andi = j hold. Compute the expectation of T — T 
with respect to populations which form a Lexis series V,,(z) = V,(x). Denote 
by a,, o, the respective mean values and variances. We then have because 
of (11): 


0=£,[T-T]=A+ Dia 2 Bw 
+ Laos + as) 2 Com + Ze On, Ong Dy Days 


Hi? 


or introducing >, B,, = B, ; x Cw = C,; 2d Dyrisuss = Duyn We Bet: 









(16) 





Let us write for the moment: 


0 = E,(T — T) = A+ LiaBs +L (on + a9)Cp + DS) ctu, Op, Duy ns « 


HisH2 















As this equality is exact for an arbitrary set of V,(x) we deduce that A = 0, 
B, = 0, C, = 0, Duy. = 0. 

Let us now compute under the same assumption the expectation of (7 — T)/T. 
Here the expectations of 1/7’, x,,/T etc. will take the place of the expectations 
of 1, z,,--:. But these new expectations will not depend on the index v 
(index within the row) because the populations are the same within each row 
and because of the symmetry of T in the m-n variables z,,. Hence we can put 


E(t) os bo, E >) = es E (=) = ” E ‘ea, = es ete. 


and we get 
2/757] . EG = 1) = Al + xl, B, + xl, Cot Do deur Dore = 9, 
Hie 


because all the coefficients are equal to zero. Our theorem is thus proved. The 
same conclusion holds if the denominator—without being symmetric in all the 


ANALYSIS OF VARIANCE 361 


m:n variables—does not depend on the row index. And as this last property 
holds for s‘, the expectations (16) are all shown to be equal to one. 

Analogous relations are valid for Poisson series. 

(c). Non-independent populations. We omit in this section the assumption of 
independence of the m-n populations but assume the theoretical population to 
be a general m-n-variate distribution: 


(17’) V(an poe? * Sond 


From V(au1 , 212 , *** mn) We derive the mn one-dimensional distributions V,,(x) 
(u = 1,--- m;v = 1, --- n) by letting all the variables except x,, tend to +, 
because V,,(x) is the probability that z,, < 2 regardless of the values of the 
other variables. In a similar way we derive the 34mn(mn — 1) two dimensional 
distributions V,,,,;..»,(x, y), that is the probability that z,,,, S x and z,,,, S y. 
We get this distribution from (17’) as all the variables with the exception of z,,,, 
and 2,,», tend to +. We denote as before by a,, and o;, the expectation of 2, 
and (ay, — a») respectively. But the expectation of (2,,», — Oy,»;) (yr. — yg») 
which was zero in case of the independence of z,,,, and z,,,, may now differ 
from zero. Denote by & the expectation with respect to (17’). Then: 


S[(ay., a Op,r)(Lpgrs sii Opor2)| 


- | | or / tii: * dai ts eed i, +* med 


= [| (x = Oy,» (Y = Gipsy) AV yv4372ug( Ty) = Ry ino». = Ryzreiny> a 


Let us first deduce a general formula for the expectation of a sample variance 
in the case of dependent populations. Let P(y, --- y,) be the distribution of r 
chance variables y; , --- y, which have the average b. Denoting by 8, the ex- 
pectation of y, with respect to P, by 6 the average of the 8, , by 7, the expecta- 


tion of (y, — 8,) by R:; that of (y; — B)(y; — B;) we find, without difficulty, 
for the expectation of the sample variance 


Exp. 2 Zz (y, — | 
yr p=1 


1 ¢ 9 9 
= | fi — oF + pare + (y, — by] dP(m, 9° Yr) 


a 1 a ° 1 9 2 
=" 24+-LD6,-8 -SD Ra. 
a p=1 rT p r t<] 


Let us apply this result in the computation of the expectations of our test func- 
tions. It is not difficult to compute them in the general case of different mean 
values and variances. But we restrict ourselves to the consideration of certain 
particular cases. Take first the case where all the m-n mean values a,, are equal 












362 HILDA GEIRINGER 





to each other and likewise the m-n variances and the 4mn(mn — 1) covariances. 
Denote these magnitudes by a, o and R, respectively, we see from (18) that: 


s ¢ > Sa “4 5 
. (—*,) &(— a :) “ee (; a :) 
(19) .f  « a x us Ss 
as (oa ~ 5) - 8 (ame — 3) - (a —1)(n — >) 


=o —R. 








We have thus obtained the result that in the case of dependent populations, just 
described, the expectations of the six different test functions are still the same. 

Of course we may assume many other particular kinds of mutual dependence 
of the populations. The following assumption seems to be appropriate for 
problems where rows and columns play a different role: We consider dependence 
only within each row, that means we assume only the variables 241, tye, °° Lun 
as mutually dependent. The distribution (16) has then the following form: 


(20) V (ru jp reeee Sued = Vileu a Lin) Vo(ra1 go oe Len) sili: V alSes poe Busad 


In the usual way we derive the m-n one dimensional distributions V,,(x) and 
the 4mn(mn — 1) two-dimensional distributions V,,,,:4...(7, y). If wi ¥~ me 
such a two-dimensional distribution reduces to the product of the respective one- 
dimensional distributions. Only the }mn(n — 1) bivariate distributions derived 


from one and the same V, (2,1, --* un) will not reduce in this way. 
Denoting again by & the expectation with respect to V(ru , --+ 2mn) we find: 
(21) S[(ty,5 an Ot, i) (Lpg f — Apo i] = 0 Mi * pe 


= RY? wy = mandi ¥ j. 
Applying now formula (18) in the computation of the expectations of s*, s*, and 
2 
s, we find: 


&[0 DS (tw — a)’ 


mn — Je 2. 


mn 


+ OD (aw - a) - 2 FER, 
N p=1 i<j 
Oe uc, ~—i~ "ete 
(22) ~ si - mn 
+ETw~al ~2E Lew. 
NM p=1 i<j 


ue 


io >v@-o}=*—'FT dd 


mn 


9 2 — _ ) 
+n) (a — a) + » DS > ry. 


mn u=1 ic; 














ANALYSIS OF VARIANCE 363 


Let us now suppose that all the m-n distributions are equal to each other, or, at 
least, that: 


(23) Oy = a. 


This assumption, which is characterized by (21), is, of course, different from 
the one which leads us to (19). We find now by means of (22), if we set 


> ER; = RB and . 


mn(mn — 1) 


(24) ci oe | a. 
mn — 1 mn — 1 mn — 1 
Assuming R > 0 (positive average correlation) we may compare this result 
with (11) Section 1. The term on the right side of (24) is also of the same order 
of magnitude as that in (11).—For negative R the term on the right side of (24) 
is negative and the equation may be compared with (13) Section 1. We see 
that for the test functions s’/r and s2/rq “positive, (negative) average correlation 
within rows’”’ has the same effect as ‘‘Lexis (Poisson) Series” of populations. 
Consider now the test functions 32 and S*. We find 


(25) 6[sa"] = 6[22(a, — a)"] = “—! x02, + m2, — a)” — ZR 
mn mn 


? 


and 


i ~ tate, ~«-44aR~ SAE Des 
mn 
(25") 


+ TZ(ay — ay + a)” soe = we L) B. 
mn 
Assuming (23) we get: 
‘ 3 = 
2 oS a a ee 
- | = - 
and if R > 0: 


2 =2 =2 
i / e Sa ~ Sw ~ Sa 


The first equality is analogous to (11) and (14) of Section 2 for positive or nega- 
tive R respectively.” We also get under the assumption (23) 


££ _ ~*~ Ss _ # 8 
[24] - *lase=a)- Sle a] 


10 | have studied in another paper the combination of Lexis series and ‘‘positive correla- 
tion within rows.’”’ It turns out that the two kinds of positive effects reinforce each other. 
The same is true for ‘‘negative correlation” and Poisson series. See [3]. 








364 HILDA GEIRINGER 





These are the same equations as (12) Section 2, and they are true for either sign 
of R. Hence they provide no way to decide between Lexis series and correlated 
populations. But computing the expectations of the magnitudes which occur 
in (15) Section 2 we find from (22), (25) and (25’) 


“4 8 dat 3, a ae 


. S 2 
‘l@=ne=nl|-"-* 


And hence we may say: 

If the observed value of s2/(m — 1) is greater than that of &,/n(m — 1) this can be 
explained either by the assumption of a Lexis series or a positive correlation within 
rows; but their equality indicate, a Poisson series; and if the first is smaller than 
the second we may assume negative correlation. 

In the same way we may explain 


- Ss 
E= | " leet be 


either by positive correlation or by Lexis series; whereas the equality indicates 
a Poisson series and the sign < indicates negative correlation. 

(d). The variances of the test functions. We have still to find the variance of 
our test functions. Let us compute the variance of 














(28) 






ZZ(tw — a, — dy + a)’ 


with respect to the m-n dimensional distribution V(21)V (a2) --- V(2mn)- 
Let us put 








(29) Ly — & —d +a= Ww, 


then we see that the average of the y,, equals zero 





" rr 1 1 7 
Y = — ry, = a — — nda, —— mrad, +a = 0. 
mn mn mn 






and 







2 5 << - 2 a =\2 
S' = 22(t4y~ — a, — 4, + a)” = TI(y» — 9g). 
Each y,, is a linear function of the z,, e.g. 
w > & 


(m — 1)(n — 1) 
Yu = Ly —_—_- 









— n os 1 m l ™m n 
0 Seha~*s hak ht 
j= 


7 mn i=? mn 2 2 


= Xudre + de Do tj + ds Do ta + Me 2 De ay 
















ign 
ted 
cur 


be 
in 
an 


ANALYSIS OF VARIANCE 365 





Using the same notations as in Section 1 (c) we find, because of the independence 
of each chance variable 

Var (yu) = Ajo’ + Ag(n — 1)o” + AZ(m — 1)o” 
(31’) 


+ rvAi(m — 1)(n — 1)o” = (m — 1)(n — 1) e 
mn 





and we find the same result for each y,, : 


(m — 1)(n—1) 2 
—-—-____ g 
mn 


(31) o” = Var (Yy) = 
















in agreement with the fourth line of (7) of this section. We still need M; the 
fourth moment about the mean of y,, which we can compute from the fourth 
moment of a sum. We find 





(32) 
and we have 
A =r t+ (n— Ida + (mM — IAS + (m — 1)(n — 1) 


« z - =2 (m*> — 3m + 3)(n® — 3n + 3), 
m3 ni 


M; = AM, + 6Bo’, 


(33) 





and 
Ai{AZ(n — 1) + AZ(m — 1) + AX(m — 1)(n — 1)} 
+ d2(n — 1){3Aa(n — 2) + AZ(m — 1) + Ad(m — 1)(n — 1} 
+ d3(m — 1){3dA3(m — 2) + A(m — 1)(n — 1)} 
+ 4di(m — 1)(n — 1)[(m — 1)(n — 1) — 1. 
If we introduce the values of Ai, Ax, As, As We find 
m'n'B = (m — 1)°(n — 1)°(m + n) + (m — 1)°(n — 1)°(m + nn — 2) 
+ 3(m — 1)(n — 1)[(m — 1)*(n — 2) + (n — 1)"(m — 2) 
+ (mn — m — n)] 


this expression as well as that of A may be easily computed for different values 
of m and n. 


(34’) 


(34) 


If m and n are large, B is of order ~ + : ; from (31)—(34) we see that in this 


case o” is approximately equal to o” and M; to My. 
Using now (18’) we find finally 


mn — 


1 
mn 


Var {22 (zp — a, — a, + a)} = (mn — 1)M; — (mn — 3)o"*} 





366 HILDA GEIRINGER 


where M; and o” are the expressions just computed. If we compare the variances 
of the test functions s3/(m — 1) and S’/(m — 1)(n — 1) we see that whereas 
the variance of the first expression is of order 1/m that of the second is of order 
1/mn. Hence for large values of n the latter expression is more exact than the 
former (see the analogous remark Section 1 (c)). A similar statement can be 
made if 32/(n — 1) takes the place of s;/(m — 1). 


3. Bivariate distributions. Analysis of covariance. 


(a). Problem. Suppose m persons are throwing two dice, n times; we observe 
the respective numbers on each die in these m-n trials. Or we observe on m 
groups of n persons the color of the hair and of the eyes. Or else we state for 
nm years the yield of wheat (in bushels) per acre and the production cost (per 
bushel) for m farms; ete. 

We consider m-n pairs of numbers 2, Y»-. Let V(x, y)" be the 
probability that x, S «and yy S y; Vw(«r, + ©) = Ve (x), Vu(+ ©, y) = 
V‘? (y) and introduce the following mean values and variances 


(1) [[ eave) = a0, ff yaVulx,u) = Bo 
(2) ff (e-em) aVux,y) =o, | f(y — Be)* dV (zy) = a, 
(3) | [= exe) .y = Bye) AV la, ¥) = Ye 

LS a = 


nm » 
(4) . 

— >) Bw = By, =8 

ny» 
Let us compute the mathematical expectations of certain test functions with 
respect to the 2mn-dimensional distributions 

Vulan , Yu) Vi2(212 , Y12) +++ Vinn(Xmny Ymn)- Let 
E(F(an , yu, °° mn, Ymn)| 

(5) 


wm | - | Flan , --* Yon) @Vulen , yu) +--+ dV nalton Yur) 


ry 
ue 


ae 
11 In the particular case where V,,(z, y) has everywhere a derivative ; 5, We can use the 
y 


Vy ’ . ers 
two dimensional density v,,(z, y) = a and the one-dimensional densities 
y 


vy» (z) = [row dy; v4)’ (y) = [ vole, y) dz 
and we have 


~ yu 
vv@ =f w@d, vew=[ waar. 
lL 30 oe 7 





ANALYSIS OF VARIANCE 


We then have” 


(5’) F(G(au oS Znn)} = | cane i G(an are fan) dV{Y (au) fo dV E22 (mn): 
(241) (2mn) 


In analogy with previous notations we introduce 


1 1 1 
w= Dm, Get Le, om LOL, 
v B 


mn 


1 = 1 1 
b, = - 2, we; b, > > >» 
n v m cd 


mn 


(6) 


= D2(r, — a)’, s, = n>(a, — a)’, 8» = TD(ty — ay)” 
= L2(tyw — a, — a+ a)’, 3 = m>(a, — a)’, - = 2r(ty — a,)° 
= >X(y,—b, t=nr(b,—b), ti = TZU(yYw— b,) 


= T2(y» — b, — a, + b)’, i; = m2(b, — b)’, i = 22(yp — 6)’, 


2 


= 22(tw —a)(Yw—b), C= TEL — ay — a, + a)(Yw — b, — 6, +b) 
‘a = n>(a, — a)(b, — b) Cy = DUI» — Ay)(Yu» — by) 
= m>(d, — a)(b, — b) Ge» = TU(ayw — &)(Yw — by) 
we then have 
P= S+etR=se+e=%+h, 
(9) P= TP +h th=a=h+h=t +h, 
= C+ Cat Ca = Cat Cy = Cat Cw, 
and corresponding relations for the ranks of these quadratic forms. We find 


for the expectations of these test functions, in analogy with previously investi- 
gated formulae: 


_. ZZ (By irr B)’, 


mn —1 


DEO — a)(Byy _ 8), 


x NZ( Oy aa a)(Byy ao B), 


eh el 
mn 


12 1¢ may be mentioned that the problem considered in this section of mn bivariate 
distribution v,,(z, y) constitutes, of course, only a particular case of dependence (see 
section 2, (c)) for a 2mn dimensional population v(211 , yun, Liz, Yr2» *** Lmny Ymn)- 





368 HILDA GEIRINGER 


1) If all the a,, equal each other, or all the B,, equal each other, we find: 


Ca a fe 
| ’ li | “— E- | 


_p C _ x Ca hs Cw = 2" 


These formulae provide us with unbiased estimates of 2Zy,, . 
2) The ay, are equal within each row but differ from row to row, (Lexis) ay, = a, 
~ a; a = a whereas the B,, may have arbitrary values, then 


Ca _f£ Cw _ C 
=) ~ | | an B.| — 5 | = Bul 5-7» |- 


The same equalities are valid for arbitrary a,, if the B, = 8, ; 8 = 6. Our 
new equalities may be of some interest because inequalities analogous to those 
of the Lexis case cannot be proved for covariances. If the observed values of 
the expressions in (13) are significantly different we may conclude that neither 
the a,, nor the 6,, form a Lexis series. A judgment of the test (13) might be 
based on the investigation of its power function. But besides we have the 
equalities (12) and analogous equalities containing C.Paedc. 


3) If either oe a, ~ a, ay, = a, 


or Boo es B, ¥# B, Bu = B. 


We have the new equalities 


. Ca _ ee a C 
a) Blea ]= Lame =a] 2 lee) 


and there are no inequalities analogous to the inequalities (14) of Section 2, and 
(13), (14) of Section 1. 

Most of the investigations of Sections 1 and 2 can be generalized for this two 
dimensional problem. 


BIBLIOGRAPHY 


[1] R. A. Fisuer, Statistical Methods for Research Workers, 6th ed., p. 214 ff. 

[2] R. A. Fisner, “Applications of ‘Student’s’ distributions,’’ Metron, Vol. 5 (1926), 
pp. 90-104. 

[3] H. Gerrincer, ‘‘A new explanation of non-normal dispersion in the Lexis theory,”’ 
Econometrica, Vol. 10 (1942), pp. 53-60. 

[4] F. R. HELMERt, Zeits. fiir Math. und Physik, Vol. 21 (1876), p. 192-218. 

[5] I. O. Irwin, ‘“Mathematical theorems involved in the analysis of variance,’’ Jour. 
Roy. Stat. Soc., Vol. 94 (1931), pp. 284-300. 

[6] W. G. Mapow, “Limiting distributions of quadratic and bilinear forms,’’ Annals of 
Math. Stat., Vol. 11 (1940), pp. 125-147. 

[7] R. v. Mises, ‘“‘Theorie des probabilites. Fondements et applications,’’ Annales de 
U’Institut Poincare, (1931), pp. 137-190. 





ANALYSIS OF VARIANCE 


369 


[8] H. L. Rrerz, ‘‘On the Lexis theory and the analysis of variance,’’ Bull. Am. Math. 
Soc., (1932), pp. 731 ff. 


[9] A. A. Tscuuprow, Skandinavisk Aktuarietidskrift, Vol. 6 (1918). 
[10] A. Wap, Lectures on the Analysis of Variance and Covariance, Columbia University, 
1941. 


[11] Mitton FrriepMan, ‘‘The use of ranks to avoid the assumption of normality,” Jour. 
Amer. Stat. Assn., Vol. 32 (1937), pp. 675-701. 





